Hi All,
Logged into my site just now to find that Throttle had kicked in and half my site wasn't working. I managed to get into my site and wind back the settings the Throttle module was using, but at the moment there is no way I should be hitting the guest throttle limit of 100 users. I know this because AwStats is reporting my average daily uniques as 147, and PHP-Stats which uses javascript says the most users ever online at once was 6.
I've been having a lot of trouble with Trackback spam in recent times despite having the Spam and Akismet modules installed and working, so much so that I abandoned use of the Trackback module a few weeks back. I've also been getting hammered with referrer spam flat out, and at a guess 95% of the referrers in my Drupal log are spam. My Drupal watchdog logs are also full of access denied and page not found messages, and from the looks of the URL's they are trying to access I'd say they have to be bots as well.
The big question for me is what can I do to stop these bots from consuming so much of my servers resources? I've added a crawl delay of 10 seconds to my robots.txt file, but that is only going to slow down the good guys. What can I do about the bad ones? I'm using PHP 4.4.4 on IIS 6. Thanks.
Comments
Block TrackBack SPAM ers via htaccess
I was having the same problem with TrackBack SPAM myself just a few days ago. I turned it on for 12 hours and saw nothing but %&^$# junk so I deactivated it. I might give it another in the future when I have a bunch of good SPAM blocking tools in place.
But as you noticed, turning TrackBack off doesn't stop the SPAMers from coming. In my case I had only been hit by three servers when I deactivated the TrackBack function. But those three servers had left me hundreds of TrackBack SPAM. The sick part is that even after deactivating it, more and more and more SPAMers kept coming. I first I blocked their IPs via firewall, but that only kills them one at a time and its after the fact. Then I remembered the .htaccess file having commands to block access to directories. That killed them quick. This gets them at the Apache level and serves them with a server error 403(security failure) that they understand. Just urning off TrackBacks still had Drupal using up lots of CPU & Memory to issue its errors. Below is the very small mod I made to part of my .htaccess file. Just do the same thing with any directories that you would like to block, and your overhead from SPAMers will decrease.
I have my new line at the top and the old one commented out below. Be sure and just update the file itself (do not cut and paste this) since I had to break the line just before the word "code-style" so it would fit. Also the critical update is near the end of that line. All I did was add |trackback
# Protect files and directories from prying eyes.<FilesMatch "(\.(engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)|
code-style\.pl|Entries.*|Repository|Root|Tag|Template|trackback)$">
Order allow,deny
<FilesMatch>
# Protect files and directories from prying eyes. Original added trackback PPH
#<FilesMatch #"(\.(engine|inc|info|install|module|profile|po|sh|.*sql|theme|tpl(\.php)?|xtmpl)|
code-style\.pl|Entries.*|Repository|Root|Tag|Template)$">
# Order allow,deny
#</FilesMatch>
-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front
http://www.SunflowerChildren.org/ Helping children around the world
-------------------
http://PrivacyDigest.com/ News from the Privacy Front (Drupal)
http://CongressionalResearchReports.com/ Bringing you the research that your taxes already paid for. ( Beta/Drupal)
and for Referrer SPAM
I haven't tried this myself but the concept seems sound.
http://drupal.org/node/16427#comment-28673
-------------------
http://www.PrivacyDigest.com/ News from the Privacy Front
http://www.SunflowerChildren.org/ Helping children around the world
-------------------
http://PrivacyDigest.com/ News from the Privacy Front (Drupal)
http://CongressionalResearchReports.com/ Bringing you the research that your taxes already paid for. ( Beta/Drupal)
Another consideration
If your Cron job runs frequently, make sure that you aren't doing things to request the search engines index your site every time it runs (gsitemap module).
Nancy W.
now running 5 sites on Drupal so far
Drupal Cookbook (for New Drupallers)
Adding Hidden Design or How To notes in Your Database
NancyDru
Thanks for the advice all.
Thanks for the advice all. I'm using IIS, so I'll see if I can get those .htaccess rules translated into ISAPI Rewrite. I do have gsitemap installed, but I don't have cron submission turned on. Thanks again for the tips:)
----------------
Dominic Ryan
www.iis-aid.com
----------------
Dominic Ryan
www.iis-aid.com
Just looking through my logs
Just looking through my logs and the worst bot offender seems to be QihooBot. Anyone else getting hit by this bot?
----------------
Dominic Ryan
www.iis-aid.com
----------------
Dominic Ryan
www.iis-aid.com
Experiment: Sanity checks
I've patched the trackback module (Drupal 5.1) to do some sanity checks:
All trackback spam I received up to now (sometimes hundereds a day... *sigh*) are caught by the first rule. If you want to help me test this scheme (there is the risk of being too rigorous), you can trackback this article:
http://stefan.ploing.de/2007-03-28-drupal-modules-multiping-trackback-ex...
If you like to test it on your site: Feel free to contact me, I can send you the patch.
Sounds like the right way to
Sounds like the right way to go Skyr. I've actually removed trackbacks from my site entirely now. I've been working on some ISAPI Rewrite rules to rid myself of referral spam, and they are working really well. Referral spam in my Drupal logs have gone from 9 out of 10 to 0. I wrote a how-to guide on it here;
Blocking referrer spam on IIS with ISAPI Rewrite
I'm still working on my ISAPI Rewrite rules to stop bots from getting access to certain parts of the site, and results look promising so far. My basic aim is to stop these nasties before they hit Drupal so I conserve my system resources.
----------------
Dominic Ryan
www.iis-aid.com
----------------
Dominic Ryan
www.iis-aid.com
Very interested in checking out the patch
Very interested in checking out the patch, but have to get it working on a 4.7 production site.
Could you post the file anyway, ideally ported?
Here you go...
Here's the diff against trackback-5.x-1.1:
http://stefan.ploing.de/2007-04-02-drupal-trackback-patch
You'll have to backport it to 4.7 yourself, though chances are that the patch will work for the 4.7 module, too.
Patched Trackback module available?
This approach looks very promising. Has the official module been patched or is there any way to get a patched module?
Thank you