Because of the excessive multiplication of worthless bots, harvesters, crawlers, spiders and content scrapers in the past months I'm searching for manageable approaches to keep my Drupal sites alive without having to watch Apache's access.log 24/7.
A feasible approach is described in this article: http://sebastians-pamphlets.com/smart-robots-txt/
The robotstxt appears to be the proper place for a dynamically generated and smart robots.txt file. Thus I'd like to suggest this as a new feature for robotstxt.
Thanks for considering this feature request ;)
Comments
Comment #1
hass commentedNo idea what a smart robots.txt should be.
Comment #2
asb commentedIn the initial posting, a link has been provided for reference what a smart robots.txt would be.
As it seems, the feature request will not even be considered, thus changing status.
Comment #3
hass commentedThis can really done a lot easier and without bootstraping drupal. See http://www.helicontech.com/isapi_rewrite/doc/examples.htm in section "Block annoying robots".
Comment #4
asb commentedYeah, I tried something similar, but it broke Boost's rewriting which I couldn't fix. The approach from sebastians-pamphlets.com also works without bootstraping Drupal, but then I couldn't run the 'robotstxt' module, and ther'd be no UI to add new bot signatures. Also there's another approach with a broader scope, ZB Block. This one also fails to integrate with Drupal as it does not want to bootstrap Drupal, as well.
So the bottomline is: Solutions might exist in theory, but I can't implement them in reality. Makes my life harder and the spammer's easier. It's a real pity that it's so much easier to operate a spambot script than to protect a Drupal site against it ;-/