Important: when posting, please provide your Club Number at a minimum, and as many details as possible.
For further info, please read This page before posting.

Regarding the club websites

  • pratik411
  • pratik411's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 2
  • Thanks: 0

Regarding the club websites

9 years 5 months ago
#58900
Hi,

I was going through few club websites and came across a robots.txt file which is used by the search engines to index a website. Why do we have a Disallow for all the files ?

User-agent: *
Disallow: /

This blocks the entire website from getting indexed in google. Can I change it myself for the club website?

Thanks
The topic has been locked.
  • SteveTheTechie
  • SteveTheTechie's Avatar
  • Offline
  • Emeritus
  • Emeritus
  • Posts: 11492
  • Thanks: 3057

Re: Regarding the club websites

9 years 5 months ago - 9 years 5 months ago
#58904
We also use Allows that will allow indexing on specific folders that we deem important.

You cannot change that as it directly impacts overall system performance. We are not going to grant access to that.
Last edit: 9 years 5 months ago by SteveTheTechie.
The topic has been locked.
  • pratik411
  • pratik411's Avatar Topic Author
  • Offline
  • New Member
  • New Member
  • Posts: 2
  • Thanks: 0

Re: Regarding the club websites

9 years 5 months ago
#58906
Hi Sir,

Thanks for the reply.

Kindly find below my robots.txt file. Having the below robots file won't facilitate indexing as there is no
Code:
allow
parameter specified.

Help appreciated.
Code:
User-agent: Googlebot Crawl-delay: 10 Disallow: /jquery/ Disallow: /fthadmin/ Disallow: /logfiles/ Disallow: /OLD_FILES/ Disallow: /text/ Disallow: /js/ckeditor_smileys/ Disallow: /js/ Disallow: /json/ User-agent: bingbot Crawl-delay: 10 Disallow: /jquery/ Disallow: /fthadmin/ Disallow: /logfiles/ Disallow: /OLD_FILES/ Disallow: /text/ Disallow: /js/ Disallow: /json/ User-agent: MSNBot Crawl-delay: 10 Disallow: /jquery/ Disallow: /fthadmin/ Disallow: /logfiles/ Disallow: /OLD_FILES/ Disallow: /text/ Disallow: /js/ Disallow: /json/ User-agent: Slurp Crawl-delay: 10 Disallow: /jquery/ Disallow: /fthadmin/ Disallow: /logfiles/ Disallow: /OLD_FILES/ Disallow: /text/ Disallow: /js/ Disallow: /json/ User-agent: * Disallow: / Crawl-delay: 60 User-agent: magpie-crawler Disallow: / User-agent: rogerbot Disallow: / User-agent: AhrefsBot Disallow: / User-agent: stremorbot Disallow: / User-agent: YandexBot Disallow: / User-agent: Ezooms Disallow: / User-agent: omgilibot Disallow: / User-agent: SeznamBot Disallow: / User-agent: Baiduspider Disallow: / User-agent: Sosospider Disallow: / User-agent: Sosospider+ Disallow: / User-agent: wonderbot/JS 1.0 Disallow: /
The topic has been locked.
  • GeorgeMarshall
  • GeorgeMarshall's Avatar
  • FreeToastHost Ambassador
  • FreeToastHost Ambassador
  • Thanks: 0

Re: Regarding the club websites

9 years 5 months ago
#58907
Google is certainly indexing clubs on FTH.
The topic has been locked.
  • SteveTheTechie
  • SteveTheTechie's Avatar
  • Offline
  • Emeritus
  • Emeritus
  • Posts: 11492
  • Thanks: 3057

Re: Regarding the club websites

9 years 5 months ago
#58908
pratik411 wrote: Kindly find below my robots.txt file.

Ok, thanks, I will look into that when I get a chance this evening. We may have temporarily tweaked it a few months ago during the officer changeover when system performance was suffering.

However, keep in mind the following:

It is not your robots.txt file, per se. Any idea that that FTH websites are totally independent of one another and having their own robots.txt file is a bit of an illusion that the system promotes. The system is essentially really only one website (template) with many "personalities" (content), each one which results in "an individual club website".

We manage the robots.txt file(s) for the entire system to keep the *system* performance optimal.
The topic has been locked.
  • SteveTheTechie
  • SteveTheTechie's Avatar
  • Offline
  • Emeritus
  • Emeritus
  • Posts: 11492
  • Thanks: 3057

Re: Regarding the club websites

9 years 5 months ago - 9 years 5 months ago
#58909
GeorgeMarshall wrote: Google is certainly indexing clubs on FTH.

Understood. However, as Brian will confirm, we have had instances where bad bots have just brought the system performance "to its knees" (not googlebot), so we tend to be very picky about what bots we allow and where we allow them to venture. In any case, in some cases, robots.txt is ignored anyway.

We do want to enable SEO, but we tend to focus on the major search providers.

I would refer you to the following article on semalt as an example: www.incapsula.com/blog/semalt-botnet-spam.html This *killed* the system performance a few years ago.

It is probably time for us to tweak our robots.txt file anyway for current bad bots, so this thread is probably a good thing. www.botreports.com/badbots/

I know that Google is looking at more parts of the websites (including javascript), so we may need to relax that a bit. However, I will be very wary of any resulting system performance impacts.
Last edit: 9 years 5 months ago by SteveTheTechie.
The topic has been locked.
Moderators: BrianHeniGeorgeMarshallPamrhtaylor3marc33NotLiableNSBjgavinLcala305peterb323DebbieT
Time to create page: 0.141 seconds