This causes false alarms, for example in the Google Webmaster Tools stats it will show many errors logged (correctly) as connection errors - the bot thinks your site is offline or partially/temporarily broken etc. As we deny access for some URLs bots should never attempt to crawl anyway, we should respond with standard 403 error instead, so the bot will understand it as "nothing to see here!" and give up, instead of "let's try again, they are probably offline".

For reference, here is the full list of URLs by default denied for all known bots, as either they are never cached or are known to cause serious system overload when crawled (often to index hundreds of empty pages, as it is the case with calendar module) or are just not expected to be crawled and indexed:

location ^~ /search
location ~* /(?:autocomplete|ajax|ahah)/
location ^~ /admin
location ^~ /audio/download
location ~* (?:cgi-bin|vti-bin|wp-content)
location ~* (?:calendar|event|validation|aggregator|vote_up_down|captcha)
location ~* ^/(?:.*/)?(?:user|cart|checkout|logout|flag)
location ~* /(?:node/[0-9]+/edit|node/add|comment/reply|approve|users)

Comments

omega8cc’s picture

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

jami3z’s picture

Status: Closed (fixed) » Active

hey omega8cc, sorry to reopen but I have this exact same issue and some very unhappy clients. I have taken over management of a server that is running nginx and aegir. It doesnt seem to have provision installed so where else can I go within the config to try and implement a fix like this so googlebot wont receive 444 errors and will stop failing.

cheers

steven jones’s picture

Status: Active » Closed (fixed)

Sorry @jamienotweet, please don't re-open issues for unrelated reasons.

Have a look at: http://drupal.org/support for your support options, or post your question on http://drupal.stackexchange.com/ Good luck!

jami3z’s picture

Unrelated reasons??? This is the exact issue I am having.

thanks for the help, will try and find it elsewhere.

steven jones’s picture

My apologies, I misread your comment, you probably want to upgrade to the latest Aegir version: http://community.aegirproject.org/upgrading

omega8cc’s picture

@jamienotweet There must be provision installed, but if it is a BOA based install, please open issue in the Barracuda or Octopus queue.

  • Commit 036affb on dev-nginx-6.x-2.x, dev-ssl-ip-allocation-refactor, dev-1205458-move_sites_out_of_platforms, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x by omega8cc:
    Issue #1524860 by omega8cc - Empty response 444 in Nginx confuses search...
  • Commit f40bcbb on dev-drupal-8, dev-nginx-6.x-1.x, dev-ssl-ip-allocation-refactor, dev-1205458-move_sites_out_of_platforms, 7.x-3.x, dev-subdir-multiserver, 6.x-2.x-backports, dev-helmo-3.x by omega8cc:
    Issue #1524860 by omega8cc - Empty response 444 in Nginx confuses search...