While drupal´s nodes grow in number, search engine´s crawlers start to be a problem for those sites with low bandwidth. To be indexed in mayor search engines (google, for example) is very usefull so, disallowing robots to crawl sites does not seem to be a good solution. Apart of enabling gzip compression and the like, it would be great if drupal had a built in "last-modified-since" header manager so, when someone looks for a node giving us the header "last-modified-since", drupal could see if there is some update or new comment for that node or reply with a 304 code if there have not been changes avoiding this way to consume bandwidth.
Comments
Comment #1
dries commentedDrupal supports the HTTP_IF_MODIFIED_SINCE header (and friends) and will send out a 403 header when possible.
Comment #2
garym@www.teledyn.com commentedThe biggest drain from crawlers happens due to the many alternate ways that your site can be threaded (taxo, blog, user ...) ... to cut down these crawlers downloading the same node over and over, you should exclude all but the /node path in your robots.txt