Because of the excessive multiplication of worthless bots, harvesters, crawlers, spiders and content scrapers in the past months I'm searching for manageable approaches to keep my Drupal sites alive without having to watch Apache's access.log 24/7.

A feasible approach is described in this article: http://sebastians-pamphlets.com/smart-robots-txt/

The robotstxt appears to be the proper place for a dynamically generated and smart robots.txt file. Thus I'd like to suggest this as a new feature for robotstxt.

Thanks for considering this feature request ;)

Comments

hass’s picture

Status: Active » Closed (works as designed)

No idea what a smart robots.txt should be.

asb’s picture

Status: Closed (works as designed) » Closed (won't fix)

In the initial posting, a link has been provided for reference what a smart robots.txt would be.

As it seems, the feature request will not even be considered, thus changing status.

hass’s picture

This can really done a lot easier and without bootstraping drupal. See http://www.helicontech.com/isapi_rewrite/doc/examples.htm in section "Block annoying robots".

asb’s picture

Yeah, I tried something similar, but it broke Boost's rewriting which I couldn't fix. The approach from sebastians-pamphlets.com also works without bootstraping Drupal, but then I couldn't run the 'robotstxt' module, and ther'd be no UI to add new bot signatures. Also there's another approach with a broader scope, ZB Block. This one also fails to integrate with Drupal as it does not want to bootstrap Drupal, as well.

So the bottomline is: Solutions might exist in theory, but I can't implement them in reality. Makes my life harder and the spammer's easier. It's a real pity that it's so much easier to operate a spambot script than to protect a Drupal site against it ;-/