First of all, thanks for an excellent module. I've been using it on my site since upgrading to 4.5 and it's eaten an absolutely ludicrous number of spam posts (94.79%, stat fans). The one thing I would notice though is that recent attempts have been slipping through due a change of tactics - the usual Bayesian attack complete with hypertext link rather than endless streams of posts. I don't want to switch off anonymous comments on my site, purely because I hate being forced to register accounts on other peoples', and the Preview hack doesn't seem to have been working that well. Might adding a captcha for anonymous users (possibly via the existing module) be a possibility for a future release?

(Randomly, I'm developing a whole new silver hatred for these spammers. My Drupal site is basically untouched, courtesy of this module, but my referral log is absolutely unreadable - half fake listings from idiots like AdminShop, the other half gibberish about poker games and Viagra sites....)

Comments

jeremy’s picture

That's impressive results! Are you using custom filters in addition to the Bayesian filter? Any chance you could dump your spam_statistics and spam_custom tables and attach them to this issue? I'd be quite interesting in seeing more to understand how you're getting such good results. And I'm curious to see just how much spam you're actually dealing with.

As for captcha's ("completely automated public Turing test to tell computers and humans apart"), I don't think that would be a logical addition to the spam module. Instead, it would make more sense to expand the existing captcha module so that in addition to hooking into the user registration process, it could also hook into the anonymous comment submission process. Perhaps you should open up a feature request for that module?

charybdis’s picture

StatusFileSize
new10.36 KB

"Are you using custom filters in addition to the Bayesian filter?"

A handful of custom ones - 'texas holdem', 'debt consolidation' and other particularly irritating ones that kept cropping up (phentermine notched up a pretty impressive 1960 matches on its own, although I've not seen it reappear for a few days) but that's it. The high success rate is almost certainly due to the fact that most of the spam I've been getting has been fairly similar - just relentless. I updated the filter the other day for the URL limiter, and there's eleven pages worth of automatically killed URLs. Without the module, there's no way in Hell I'd be able to have anonymous posting - if only my actual hits were as good as all those blasted referrals keep making it look ;-) The site's at http://www.richardcobbett.co.uk.

I doubt you'll get that much from the dumps - I just run a personal egosite, not particularly high-traffic and generally targetted by obviously the same bozos with the same sort of posts - but I've attached them anyway in the zip. Spam_statistics is a little bit skewed at the moment - I had it do a rebuild after updating just to make sure everything had gone okay (a slight glitch in the install), and it's only listing about 44 spam posts instead of the hundreds it actually has. The tokens and actual spam posts it presents are correct though.

jeremy’s picture

A handful of custom ones - 'texas holdem', 'debt consolidation' and other particularly irritating ones that kept cropping up (phentermine notched up a pretty impressive 1960 matches on its own, although I've not seen it reappear for a few days) but that's it.

If you'll post a dump of your spam_custom table, I'd like to include it in contrib/custom. If it works for you, it might work for someone else, too.

I updated the filter the other day for the URL limiter, and there's eleven pages worth of automatically killed URLs.

Watch this closely for a while. It worked great in my test-lab, but I haven't given it any real-world testing yet. (I only added it to the 4.5 version of the module, but I'm still using Drupal 4.4. After yesterday's slam of 350 spam, I'm feeling the need to upgrade quickly. ;)

if only my actual hits were as good as all those blasted referrals keep making it look ;-)

I'm not sure what you're referring to here -- could it be related to this?

Spam_statistics is a little bit skewed at the moment - I had it do a rebuild after updating just to make sure everything had gone okay (a slight glitch in the install), and it's only listing about 44 spam posts instead of the hundreds it actually has, The tokens and actual spam posts it presents are correct though.

Have you been deleting spam, rather than just leaving it unpublished? If not, and the spam is still in your database, then something is wrong -- after a rebuild, the statistics table should have an accurate spam count.

Was the slight install glitch a problem with the spam module that should be fixed? Or documented?

charybdis’s picture

StatusFileSize
new128.91 KB

If you'll post a dump of your spam_custom table, I'd like to include it in contrib/custom. If it works for you, it might work for someone else, too.

Sure thing. Dump attached. Not many entries in it, but they're absolutely destroying the current scripted spam-flood I think most people are being hit by.

Watch this closely for a while. It worked great in my test-lab, but I haven't given it any real-world testing yet. (I only added it to the 4.5 version of the module, but I'm still using Drupal 4.4. After yesterday's slam of 350 spam, I'm feeling the need to upgrade quickly. ;)

I plan to ;-) I double-check the lists on a regular basis - Bayesian's a cool system, but not perfect.

I'm not sure what you're referring to here -- could it be related to this?

Referral spam - nothing to do with Drupal itself. Asshats like AdminShop, with their old trick of faking referrals to peoples sites (for example, you see something like www.xope.com/friendslist/ in your list, think "Ah! I wonder what it says!", click on it, and it takes you to a spam page). I'm getting hundreds and hundreds of hits from sites with names like http://free-wibblecream-purchase.blorp (names changed to evade any filters running on this system ;-)) and it makes it almost impossible to tell who's ACTUALLY linked to the page. You get the ego-spam, which just goes for links from bloggers and site owners, and the mass referral spam, which largely hopes that the information is public.

Have you been deleting spam, rather than just leaving it unpublished? If not, and the spam is still in your database, then something is wrong -- after a rebuild, the statistics table should have an accurate spam count.

Nope, just unpublishing it to avoid having to make an new collection later. After the update, it went down from a few hundred to just over 40, but only on the status page - everything else has the full list.

Was the slight install glitch a problem with the spam module that should be fixed? Or documented?

It turned out to be the FTP process - it had timed out a few bytes into updating the file. The rebuild was just to be on the safe side.

__
i have a website. it is very, very blue

charybdis’s picture

StatusFileSize
new2.21 KB

Bugger. Sorry - uploaded the wrong file. Here's spam_custom.

__
i have a website. it is very, very blue

jeremy’s picture

"Sure thing. Dump attached. Not many entries in it, but they're absolutely destroying the current scripted spam-flood I think most people are being hit by."

Cool. Interesting that you're able to use simple text strings, not even needing to use regular expressions. I'll add your dump to the contrib directory soon.

"Nope, just unpublishing it to avoid having to make an new collection later. After the update, it went down from a few hundred to just over 40, but only on the status page - everything else has the full list."

Very odd. I'm testing a 4.4 to 4.5 upgrade currently, and this didn't happen to me. When I did a rebuild all, it worked as planned, currently showing:

learned_spam	402
rebuilt_tokens_all	3
spam	402
spam_comment	384
spam_forum	18

Offhand I don't have any ideas what might have gone wrong for you. Did you do a rebuild from administer >> spam by clicking "rebuild filter"?

BTW: The statistical collection code has some logical errors in it -- I was getting some negative counts prior to my rebuild. It's not a priority for me, but I'll look into it soon.

jeremy’s picture

Actually, looking at your spam_statistics again, I don't see a "rebuilt_tokens_all" entry at all, suggesting you've not actually done a complete rebuild...

charybdis’s picture

Could be there was a glitch on the server. I'll try hitting rebuild again and see if it fixes it.

grohk’s picture

pinkblob has coded up preliminary support for anonymous commenter Captchas via a patched comment module:

http://drupal.org/node/14675

Try at your own risk.