I am aware that keyword filters can be clbuttic fails.
I know that I am failing a logical fallacy by forming the opinion that every manufacturer, marketer and purchaser of 'ugg boots' (whatever the hell they even are) must have single digit IQs.

But FFS. in the years watching/moderating drupal.org I've seen/killed fortnightly floods of fecking Ugg boot spammers. WTF is up with these shoe pushers?

Coming distant second equal are the Nike knock-offs, curious streaming sports video streamers, but oddly few pill-pushers.

I don't have any real constructive suggestions, and don't really expect any effective infrastructure changes, so lets just vent about the spam-bastards that piss you off. Constructive suggestions are still welcome however.

What annoys or confuses you most about the pointless morons that try to hit our issue queues? Do they think they achieve anything, Do they think?

PS, Drupal.org is actually, mostly very healthy in this way compared to many 'communities'. Mostly thanks to our continual gardening (I believe). Yay team!

Comments

killes@www.drop.org’s picture

I had considered to enable spam.module on d.o. I just don't know how it would perform.

I recently added a mode that would allow us to run it without it being user visible. Suspected spam would only get marked as such, not unpublished or anything. This would be especially important for the initial phase of training the bayesian filter. Also, this mode would allow us to hold project issue mails in a queue until it is decided whether the spam is actually spam.

ELC’s picture

Doesn't Dries Buytaert run a company that specialises in filtering comment spam on Drupal?

http://drupal.org/project/mollom

.. or is what we're seeing the stuff that sneaks though?

dman’s picture

Good question, maybe d.o. should ask about getting a mollom account :-)
... I stamped on more uggs today ...

laura s’s picture

Title: spam - meta : do we have/can we have a keyword blacklist to kill "ugg"? AKA - anyone have better spam-combatting suggestions? » spam - meta : better spam-combatting suggestions?
Priority: Minor » Normal

This has come up several times, and I am a big +1 on something better. I wonder what % of webmasters issues are just to manually handle spam. It's a huge waste of community resources -- most precious resources, time and attention.

The Bayesian spam module is actually very good at learning spam patterns, including patterns peculiar to communities. I would say it would be a great addition to the site, but note that in my experience it can be pretty resource-intensive, so there's an infrastructure aspect to this (which makes it great that this is brought up by @killes).

dman’s picture

Based on personal experience, the webmaster issues only reflect 1/5 of the actual spams that actually get squashed. Many of them are so obvious and trivial that any member of the team with the appropriate rights just wipes them and moves on without noting an issue. At least that's how I go.
So there is a lot of noise that is getting quietly weeded out ... at the cost of more constructive effort.

I tune in to the top tracker page a few times a day to see if there are any help requests I could support ... or spam to stamp. If I find spam to stamp, I do so, sigh, and move on, but *probably* therefore don't spend that time answering good questions for the good people.
OTOH, on the days I *don't* spot a spammer in the first three pages of recent posts, I pick an interesting issue and contribute to it. Yay.

Extrapolating that behavior out to others (true or false, I dunno) and considering that the d.o support crew members with moderation rights are also more likely to have good expertise when contributing to help requests ... this effort costs. As Laura s says.

Also, I think that folk here in the webmaster queue are some of the most experienced with facing spam in other sites of our own, most experienced in developing solutions within Drupal, and in general, probably pretty clever folk in this area. There must be some better spam-fighting measures out there, and if we can't figure them out, who better?

silverwing’s picture

At this point I see us having two options:

1. We can moan about how bad the spam problem is like we've been doing for years. or,
2. Do something about it.

PROPOSAL: Get infrastructure to sign off on Mollom as soon as possible and begin testing.

Since we're part of the open source world, we're not stuck with just one option. We've got spam.module and Mollom*, plus a bunch of others at our disposal.

And if we can't pick one, then I propose we start with one option for a few months, then switch to another for three more. Then we can evaluate which one either works better or meets our needs better. (Or which one our servers can handle better :) )

(And I'm one of those maintainers who deletes the spam as I see it. Imagine this queue if all the spam we deleted were reported! And I'll even admit I've deleted our Peruvian Travel Spammers before they even post - their profile's were always identical and made them easy to find with a quick look at the users in their country.)

*and I don't even care much for Mollom - part of me doesn't want another computer telling us what's what here, but I can see it has some benefit for us

WorldFallz’s picture

ugg, nike, and sports streaming (this has been on the rise lately) capture most of the routine spam. The only other patterns that pop to mind are 1) wedding dresses and 2) the request for an 'evaluation' of converting a wordpress (and to a lesser extent joomla) site to drupal (usually with multiple links to the site).

There's also the issue of attempts to circumvent the nofollow policy with dofollows.

And I too am one of those spam squashers that 99% of the time just handle it without an issue. I usually only create an issue for borderline cases or where a seemingly average user starts spamming (mostly to keep a record of it in case they complain or deny it).

I would say for every issue I create there's probably 10-20 spammers I take care of without issues (of course that's a total guess).

I'm not sure it's worth training a bayesian filter or the risks of false positives that come with mollom (though it seems that might be fixed lately).

Maybe we could try something small and just do an unpublish of posts with certain keywords combined with a view for checking and managing them.

just a thought.

killes@www.drop.org’s picture

the reasons why I don't wont mollom are:

1) it is _not_ open source

2) it is an external service

3) It can be annoying. Prenting your collaborator with a captcha is ust bad style. I'd prefer to accept all user input (as we do now) and sort it out later. Spam module would allow us to do that.

silverwing’s picture

re#8 I absolutely agree. I don't want a third party computer deciding what's appropriate here, and I hate captchas with a passion. (And I removed my proposal from my comment above.)

Will spam.module have a D7 release by the end of the 1st qtr 2012? We do need to plan for the site upgrade.

Since we're going to have flag.module deployed here, would it be possible to use that along with http://drupal.org/project/flag_abuse (flag by itself isn't that great for spam moderating.) I don't see this as a long term solution, though.

Can we try http://drupal.org/project/hidden_captcha for bot registration spam?

ELC’s picture

I have to agree with Mollom going out the window. I assumed it would be similar in structure to Drupal but it's obviously very different.

The hidden_captcha thing only works until the spammers catch on and change their code. It could potentially weed out those that aren't smart enough to adapt (still works well in email spam filtering). If it's cheap and easy, and doesn't cause issues for normal users, why not.

Flag Abuse seems useful but it's just going to shortcut or replace posting to the webmaster issue queue. This would have two results - more flagging because it's easier for people to do, and create a new area that webmasters then have to monitor. Automated handling of flags opens it up for abuse.

killes@www.drop.org’s picture

I am not sure whether spam.module will have a D7 release by that date. However, automated spam protection isn't really mission critical so we could update without and then later re-enable it.

klonos’s picture

Please take a look at my proposal here: #1308176: [meta] Battle plan for stopping spam/"subscribe"/"+1"/"thank you" comments (and cleaning up old ones from the db too).. Thanx in advance.

PS: I have copy-pasted the above in so many related issues that people might argue I'm spamming the queues ;)

Michelle’s picture

Add me to the list of maintainers that just silently deletes spam. :)

I added a registration question on my site and it's worked very well aside from the fact that bots attempt to answer it every few minutes 24/7. My site is pretty niche, though. With the global Drupal community and high percentage of non-English speakers, that may be more difficult to do.

-1 from me for Mollom as well unless you exempt people after so many posts / time on site. It's fine for anonymous users but can be a right pain in the behind for regulars.

Michelle

laura s’s picture

Another possible solution might be to do something like Botcha module: http://drupal.org/project/botcha or Spamicide http://drupal.org/project/spamicide -- no UX change, except for robots. We'd want to ensure that it does not affect screen readers in any confusing way.

killes@www.drop.org’s picture

Interesting, I didn't know these modules.

Maybe somebody wants to test them on a testsite?

dman’s picture

^^^ Robot troll is awesome :=}

WorldFallz’s picture

oh the irony, lol

dman’s picture

At least it was on-topic! Their keyword scanning is getting better. I killed a handful of his others (also in threads *about* ugg spam) but had to leave this one here ..

Michelle’s picture

Awww, who unpublished it? That was funny! :)

Michelle

laura s’s picture

/me humorless spam-unpublisher. ^w^

s.Daniel’s picture

* Execlude "profile/profile_interest/*" for SE via robots.txt
http://www.google.de/search?&q=site%3Adrupal.org+people+interested+in+porn

* Return 404 for non existing interests.

killes@www.drop.org’s picture

rfay’s picture

Mollom is not open source.

And it also has an enormous number of false positives that are truly offensive to end users. It seems to randomly block people (like me). And there is no appeal. You just get "Your submission was blocked". No way to mention this to a webmaster. No way to fix it. I got so mad I wrote a blog post about it once. The same issues continue. I avoid using Mollom on any site I can.

dman’s picture

After spending the last half hour hitting 900 posts & comments from a handful of handbag and jeans and NFL spammers - I propose that anyone who thinks that rel="dofollow" does anything useful deserve an insta-ban with extreme prejudice:

*grumpy*

(edit : I meant the dofollow spam-thing, not nofollow)

WorldFallz’s picture

absolutely and since 'dofollow' is not likely to occur in a discussion in any other context (with the exception of complaining about it, lol) it would be a prime candidate for a simply keyword unpublish / user ban. ;-)

silverwing’s picture

I've always blocked users who do the "dofollow" thing.

earnan’s picture

This isn't a combating solution per se, but does anyone know the root of Drupal spamming? I have changed sites (from one with login, contact, and forum options) and replaced it with a Drupal site, and I go from almost no spam to hundreds of spam user and site contacts a week - and for a nothing site (target audience is 100 people).

How did they find my site and why does Drupal make a difference in their desire to spam it?

Perhaps answering these questions may help come up with some solutions in defending against it?

s.Daniel’s picture

What was the system you used before Drupal?
Was the site visible in Google before?
Do you see automated spam or humans posting spam manually.

Drupal is a popular system and you can search and identify weather a site is drupal or not automatically. You can use Google so search for Drupal sites. Then the register process is always the same so you can write software to register automatically. So yes there is the possibility that spamming could be reduced via altering the register / login url or forms etc but that’s just a guess. It's worth a try.

Gerhard Killesreiter’s picture

This is an issue in the drupal.org webmasters queue, please refrain from discussing any issues that are not directly relatd to drupal.org.

s.Daniel’s picture

@Killes: While the origin of the discussion is ot I think there is a clear idea that migh help drupal.org and may be simple to implement:
> Alter the login/register forms/urls a little to confuse spam bots without confusing normal users.

To know weather this is a possible way to fight spam we would need to know
* how high is the percentage of spam done by spambots
* is there a way (to register) bots usually take (e.g. go to home then follow the link "Drupal Homepage" etc or go to "/user/register" directly)

Basically it is a similar approach as proposed in #9 which I think we should try.

killes@www.drop.org’s picture

I don't think there is a way to really distinguish between spambots and real humans unless you use captchas etc. My gut feeling is that over 98% of the spammers are real people, though, adding only a few comments. We sometimes have spam bots that then spam a lot, but these aren't that common.

WorldFallz’s picture

yep... and spambots are usually spotted pretty quickly as they fill up the new posts blocks.

dman’s picture

Yeah. The only ones that I really see that take time to weed out on d.o are the "mechanical turk" style copy-pastas. And often pretty sparse.
They won't be stopped by CAPTCHAs, but would fail immediately if the keywords they are trying to push are explicitly blacklisted.

laura s’s picture

I suggest we not try to find the best solution via philosophy but actually implement one of the proposed solutions and see if it helps, because it's obvious that some sort of solution is needed to free up webmasters' time for more productive endeavors here. How about the botcha-kind of solution for starters? Run it for a week. Then re-evaluate.

silverwing’s picture

silverwing’s picture

WorldFallz’s picture

Also, I posted a node management view to another thread (in response to the recent spate of vietnam spam), but I probably should have posted it here:

http://drupal.org/node/1383816#comment-5400264

It's view that mimics the 'administer comments' view we already have.

webchick’s picture

First off, bravo and applause for deploying something that can help automate dealing with spammers. It always makes me cry when our most smart, talented people need to spend time on these sorts of tasks, as fun and relaxing as mass-spam deletion is.

I was asked to chime in here from silverwing. I went in to evaluate this the way I evaluate things for client projects.

I get the concerns about Mollom's* lack of open-sourceness, but one thing it has going for it is it's really well-maintained. Mollom was one of very few projects that actually made its #D7CX pledge, with a release available the day D7 came out. Its commit log shows a long trail of steady maintenance, with commits happening every week or so: http://drupalcode.org/project/mollom.git/shortlog/refs/heads/7.x-2.x There's a nice sea of green at http://drupal.org/project/issues/mollom?categories=All http://drupal.org/project/usage/mollom shows a steady upwards direction, indicating a project with a healthy and growing user community around it.

Spam module tells a much sadder story. #1063524: Port spam module to Drupal 7 is not only still open, > 12 months after D7 was released, but is still set to *active*. That means not only do the maintainers not care about D7 (and that's fine; they're entitled to spend their time on whatever they like, and what their clients are paying them to focus on) but *zero* of the 3700 people who depend on this module have ponied up the effort to get a port even *started*. That's *bad*. :\ Another thing that's bad is the user community around this module seems to be shrinking, not growing, according to http://drupal.org/project/usage/spam. This module feels like taking on a liability to me, and we already have a pile of code on Drupal.org that's spottily maintained, or unmaintained altogether.

So given the choice between a secret-sauce algorithm and more code that a strapped team of volunteer webnmasters has to maintain and port themselves each Drupal version? I'd take the former, personally. But I support silverwing and whatever option he chooses to take.

*DISCLAIMER: The guy who wrote Mollom is my boss. :P

Gerhard Killesreiter’s picture

That there is no current path to D7 for spam.module shouldn't deter us from deploying it on d.o D6.

1) We can switch it off again in case there really is no D7 version at the time we upgrade to D7. That would be annoying but not a huge problem.

2) Jeremy has stated that he'll likely need a D7 version in 2012, so we know we'll likely not have to do 1).

So, please, anyboy who is interested get a d.o dev site and start testing spam.module.

silverwing’s picture

Update: I've got a dev site to begin implementing our anti-spam efforts, so over the next week I'll be testing things. So things will be moving forward.

My priorities:

For Obvious Spam, spam.module. This should take care of the posts/comments that really shouldn't make it to a published state - link spammers and Identical Postings. (I think the Vietnam spam were identical.) Unfortunately I can't find a flood control system.

Then for Subtle Spam, I'll investigate Flag Abuse so our users can actually 'report' them without searching for a link or filling out issues.

For SpamBots, I'm thinking Botcha. I'm not entirely convinced it will be too effective, but it may be worth testing. (And if it's only used on /user/register the overhead would be low.) (And, like spam.module, there's no D7 version :( )

Hopefully we'll have "Administer nodes" on user profiles for Vietnam-like spam: #1383816: Add "Administer Nodes" VBO to user profiles

webchick’s picture

Flag Abuse might be problmelatic. You'll have to run it past Gerhard/Narayan. I know we had to do some kind of backflips to make subscribe flags work on d.o without bringing down the entire site, and if those are only one flag per *node*. If you start putting one flag per *comment*, egads.

Michelle’s picture

Is the overhead in having the ability to flag or in the existence of a flag? The reason I ask is that there aren't going to be that many spam comments so the actual flags won't be so high and, I'm assuming, will be deleted along with the spam comment. So ongoing number of flags should be nearly 0.

Michelle

greggles’s picture

I agree with michelle's perspective on flag abuse performance.

Adding flag abuse to g.d.o has long been on the todo list, so if we need a place to get some real world numbers I'm happy to assist.

webchick’s picture

I'm actually not sure. I think the JOIN penalty's incurred regardless, but Gerhard/Narayan would know more.

kingandy’s picture

For what it's worth, there's some discussion of the "flag spam" option here: #226678: Add a "Report spam/abuse" link to forum/issue comments (next to the "edit" & "reply" links).

silverwing’s picture

Issue tags: +antispam measures

tagging

klonos’s picture

Issue tags: -antispam measures

I recently went through this guide Drupal 7: Attaching fields to flags (note: the guide is for flag 7.x and requires #871064: Making flaggings fieldable) and while I was reading and trying it out I couldn't help but to dream how this could be used for tagging spam content. It works with AJAX so that the users won't have to leave the page they are viewing and we can use predefined "categories" of abuse (spam, subscribe comments etc). I'm pretty sure we can implement some kind of threshold of flags after which the comment could be automatically unpublished and added to a list for moderators to either revert its abuse flag or permanently delete it. These all are things I've previously suggested in #226678: Add a "Report spam/abuse" link to forum/issue comments (next to the "edit" & "reply" links)..

klonos’s picture

Issue tags: +antispam measures

...sorry about that David.

mgifford’s picture

I just found another spam message and the thought occurred to me that perhaps we should set up *.drupal.org sites so that the first few posts of a new user need to be moderated.....

Michelle’s picture

That doesn't scale well, unfortunately. There are a lot of new users. I'd be worried that people would be waiting days to get their posts approved and that is a serious turn-off.

Michelle

killes@www.drop.org’s picture

It would be helpfull if we could limit the amount of new posts that new users may create.

Ie: until your account is at least 4 weeks old you may only post up to x comments or nodes per day.

Or something like that.

Shouldn't be hard to do. Maybe there is already a module.

dman’s picture

:-(
As a spam-sniper. I see a majority of one-time posters. They are getting smarter (or something) and most of the ones I squash are drive-by one-off single posts.
Our vietnamese fellows aside, *most* of the others are single-posters.
So unfortunately, I don't see that a probation period or cooling-off period would help. a horrible number of new spammers have registered + 2 weeks presiously. The ones who have registered + 20 minutes are easy to spot, but they are not the problem ones (that I spend time spotting).

Normally I would be in favor of a cool-down (if practical), but no, that's not addressing the problems I'm seeing.

WorldFallz’s picture

i was recently looking for such a module myself, but unfortunately contrib for this functionality is a mess. There's several modules that handle nodes or comments but not both and the one that does handle both has been rewritten to depend on rules. The most seeming active module (node_limit) only handles nodes and recently dropped support for time limited limits.

It wasn't even obvious to me which one I should adopt to patch so I ended up just giving up on that functionality for the time being and made a mental note to work on a non rules based post limit module when I had some free time.

For drupal.org though, I wouldn't think the custom code to handle this would be terribly involved though I don't know enough about the performance implications to determine a method. It could either be role based (requiring a new role granted to all user accounts past a certain age) or simply hard coded into hook_nodeapi and hook_comment.

killes@www.drop.org’s picture

@dman: while the single posters are annoying, they aren't that much of a problem, I think. Our nofollow rule ensures they don't reap benefits and they don't disfigure the site that much. The recent spam attacks with 100rds of posts are IMO much more problematic. I also need to anticipate that we'll want to introduce something like forum notifications one day.

@Worldfalls: I am totally open to custom code in d.o module or a submodule. One module to rule the world is not a neccessary approach.

However, maybe spam.module is what we really should use...

silverwing’s picture

I've very much against using really heavy-handed measures to deal with spam. Moderating the first X# posts of new users is frustrating to real users and will stretch our site maintainers resources really thin. And the whole purpose of this is to make our lives easier :)

Next week I'll be setting up spam.module on a dev site. The problem is that you can't really test real-world performance on a test site that doesn't get comments/nodes, so I'll need to convince infrastructure that it's a good thing to have.

I'm also testing Flag.module to create a Watch User flag for profiles. (It's been requested.) The problem is that there's not a way to indicate *why* a user was flagged.

And I believe our philosophy on this is "We'd rather see more spam get through than real content blocked."

Michelle’s picture

http://drupal.org/project/flag_abuse allows for a "why" doesn't it?

Michelle

silverwing’s picture

@Michelle - I believe it does, yes.

I'm also looking the other option in case flag_abuse is a no-go.

WorldFallz’s picture

The recent spam attacks with 100rds of posts are IMO much more problematic.

Thanks to #1383816: Add "Administer Nodes" VBO to user profiles -- not so much any more ;-)

Beyond irritating-- yes, but at least they won't sap maintainer time anymore. IMO we can now take more time with this issue and make sure we get it right.

Now that removing spam is fairly trivial, I agree even more that we don't need a heavy handed approach that will inconvenience our end users.

imo, the first step should be to simply make easier to report spam. Since flag is already deployed, I tend to lean towards a flag based solution but it seems we still need to get some feedback on the possible performance implications. I pinged narayan via email to see if he has any input on flag_abuse.

Heine’s picture

Is it possible to fix localize.drupal.org to respond in reasonable time to backery update/block requests? It seems to time out often, returning an error after > 20 seconds.

greggles’s picture

Michelle’s picture

jhodgdon’s picture

Just a note on the idea of adding a "report abuse" flag of some sort. The idea of turning on a per-comment flag came up in another context, and this serious performance issue was discovered in the Flag module (and it hasn't been fixed yet):
#1133956: Improve efficiency when displaying lists of flaggable comments

So until this is fixed, I don't think you will want to turn on a per-comment flag on Drupal.org.

ELC’s picture

I think anyone entering the word "subscribe" on it's own, or very close to 100% of the message should have their post hit a form_set_error instructing them to click the "Follow" button.

Can we include that in the general plan for things please?

mgifford’s picture

What can be done to try some of these solutions to see if we can address the spam problem. For that matter how big is the spam problem, really? What kind of stats do we have available to determine effectiveness of these solutions? What can we do to experiment with different solutions. For that matter what are the solutions being considered.. Hmm, ok, let me summarize:

Modules
- http://drupal.org/project/spam
- http://drupal.org/project/mollom
- http://drupal.org/project/flag_abuse
- http://drupal.org/project/hidden_captcha
- http://drupal.org/project/botcha
- http://drupal.org/project/spamicide

Other Modules
- http://drupal.org/project/modules?filters=tid%3A7266

Other reviews
- http://anarcat.koumbit.org/node/173

Decision points
- open source or not
- accessibility and usability
- performance
- Drupal 7 version
- Auto identify spam or make it easier for users to report spam

What am I missing?

Can we start trying a few of these solutions?

Ayesh’s picture

If we ever use Flag Abuse, probably we have to flag the user - instead of flagging the node itself in node. Bots post hundreds of nodes but lesser user accounts used. So it will make administration part easier. We don't want to get the whole flood to the admin panel.

-1 for Mollom. I love that product, but filtering all the posts with help of a closed source system is not a very good idea.

klonos’s picture

Yeah, it does seem more logical to be able to flag users as spammers rather than having their comments flagged as spam.

kingandy’s picture

I can see circumstances where flagging the individual post would be useful - if an established user with hundreds of other posts creates a single comment that is unquestionably spam, it would be useful to know which particular piece has raised concern. I realise this is an edge case, and the issue we're trying to tackle here is automated accounts with nothing but spam to their name, but still.

It works better in my head to flag the individual content item (node or comment) at a data level, but have a view in the admin area which presents all (unblocked) users who own flagged (published) content. I don't know what sort of an impact that would have in infrastructure terms, though.

klonos’s picture

Title: spam - meta : better spam-combatting suggestions? » spam - meta : better spam-combating suggestions?

Sure, we all might make a couple or more inappropriate comments in our lifetime, but that -as you've said yourself- can only remotely be classified as spam (I prefer the term "abusive comment"). That's not the issue we are trying to tackle here though. As the issue's title successfully puts it, we're trying to "combat" a tedious task d.o site admins are facing: automated spam posts by fake user accounts solely created for that purpose. Posting a reply to one of our fellow drupalers telling them to "cool down" after a comment they've made while "in heat" does not classify the task as combating at all ;)

Michelle’s picture

I think the only thing we should be worried about for this is flagging spammers & abusers. (By abusers, I don't mean someone being a jerk in a single post but rather like posting a bunch of junk pages or replying on a bunch of threads just to call people names or something). If an otherwise upstanding user has a bad day and makes an inappropriate comment, that doesn't need to be flagged. If an otherwise upstanding user takes a turn for the worse and has a string of bad comments, that should be a webmaster issue, not an impersonal link clinking.

Keep in mind, too, that flagging is just reporting, not blocking. There are people with common sense that will be looking at the flags to determine what action should be taken.

So, tl;dr: No need to flag content, just people, and let the maintainers take it from there.

laura s’s picture

So the options seem to fall into two categories:

  1. Leverage code logic (e.g., spam module)
  2. Leverage community (e.g., flag abuse)

Some ideas combine the two (e.g., moderation of first x posts).

We have open source code, so it's tempting to solve with code, if possible. Always great if we can leverage tools to our advantage. Even better when it leads to improving performance of modules others make frequent use of.

However, we have an amazingly active community, and part of our broader goal is making d.o more useful and usable to end-users. My feeling is that the more we can take advantage of in this direction, the more long-term viability we have.

For example, we could gain much leverage if we had *very easy to notice and use* links/buttons encouraging people to mark a post or comment:

  • "Mark as useful" [can help highlight good contributions]
  • "Mark as not-useful" [can help de-emphasize ungood contributions]
  • "Mark as spam/abuse" [can help everyone flag spam with 1-2 easy clicks]

Please forgive the OT components here, but perhaps the solution here can dovetail with other efforts not related to sp4m control?

apaderno’s picture

Title: spam - meta : better spam-combating suggestions? » Spam - meta: better spam-combating suggestions
achton’s picture

For quick results, I would suggest a single button/icon in all comments/posts: "Mark user as spambot" (or something to that effect).
Set some sane threshold for number of community members that have flagged the user, and after that, hide alle posts. The flagged users can then be easily cleaned from some backend VBO view.

#53/#54: I would try to pursue the cool-off period idea further. Definitely nothing to do with moderation, but it should be possible to set up a formula for disallowing more than x posts/comments within the first y hours of registration that targets these people without becoming a nuisance for real users.

shamio’s picture

Its a good suggestion but it needs having a system or algorithm to choose allowed users to mark other users posts. Because if someone purposely wants to remove a user, he can mark some of his posts as spam, and then the posts of that user will be removed. It is a good suggestion if we limit allowed users to have the right of marking others. For example we can allow users with more than a special number of posts to do that or something like that.

klonos’s picture

...if someone purposely wants to remove a user, he can mark some of his posts as spam, and then the posts of that user will be removed.

#482312: Proposal: unpublish rather than delete content

Step1: we have a threshold (say 5 different users) of marking a comment as spam/abusive and then unpublish instead of delete. This will take care of non-spam, abusive comments from legit users too. Requiring a certain number of different users to mark a post as abusive (mostly) ensures that this action was not an "act of revenge" ...well unless we have people "mobbing" against a poor fellow. That would be an edge case and I like to think we are not that kind of community here.

Step2: we have a second threshold for unpublished, flagged from step1 comments (say 3 such unpublished, abusive comments) over a certain period (say in less then 3 days). This will give legit users a "slack" of having a bad day or two every once in a while (still removing their nasty comments). This "bad-moment" counter will be flashed after 3 days in order to ensure that ;)

If the counter from step #2 above exceeds the defined threshold, we either auto-delete the user and all their comments, or in order to be safe we do one of the following:

- Revoke user's right to post comments & mark their account as "pending moderation" (from a real person admin). Their comments are unpublished from all over the place - not deleted though. The admin reviews the user account, its posts etc and decides on an action.
- Give this user a "grace period" (say 10 days) where they can apply a "repent request". Admin time is not wasted on reviewing spambot accounts - only real users will file requests to unblock their accounts (plus we can have CAPTCHAs and other measures in that form). If after this grace period has passed and still no request was filed, then the account and comments are auto-deleted.

klonos’s picture

...in fact, I think that something like that with configurable thresholds and selectable actions should be crafted as a core security module in D8. This is not an issue we face in d.o alone.

Edit: ...here we go: #1483764: Basic anti-spam features in Drupal core (separate module).

shamio’s picture

Your ideas and the steps you said are really great. If programmers code it, it would be great.

Ayesh’s picture

Thumbs up for #76.
Sometimes new users post links to their actual site asking others to have a look and help them. So keeping first X posts of a new user in moderation is not a very good idea for their satisfaction, I guess.

In d.o, most vietnames nodes are posted at a very high rate. I mean... in few minutes, first 2-3 pages of Post Installation forum gets flooded by spam posts. So an additional flood control mechanism can slow them down. Most comments here at d.o are not just "thanks" comments so it's little strange that a legit user posts 50 replies in an hour.

webchick’s picture

#1382008: Ongoing Vietnamese forum spam reports has over a dozen real, honest-to-goodness human beings, who are some of the most valuable in our project, pissing away time that could be spent on making Drupal awesome instead on deleting spam/spammers. :(

Is there something needed to support you here? A call out for someone to write a flood control module?

dman’s picture

Instant honarary Drupal Rockstar status for anyone who sorts out a flood control module worthy of D.O.

WorldFallz’s picture

I have cycles to work on it, but it's the 'worthy of D.O' that gives me pause. I'm willing to give it a shot though. It would be really helpful if could come to some sort of agreement on the basic parameters and overall approach so i don't go off in a direction that's unsustainable for d.o.

I would think we're talking about using hook_comment on the validate or insert ops and checking the age of the user, the role of the user, and # of existing comments for that users in a particular time period.

is that acceptable for d.o?

Ayesh’s picture

Thanks for getting started WorldFallz!

What about checking time since last comment/node ? Flood control.
I'm not sure if it's possible to count number of comments posted by a user in easy way but I think the counter at user profiles that shows document edit count (over 10 edits, over 1000 edits, etc) is a good factor.

shamio’s picture

I just have another idea to check the submitted comments by new users (i.e users who signed up in last 3 weeks) for special keywords and if their first i.e 2-3 comments have one of those keywords, submitting more comment will be disabled for them until a moderator review their comments and confirm or disable their accounts. Whats you idea about it? Is it easy to program?

Michelle’s picture

One problem with flood control... Floods of spam are easy to spot and usually wiped out quickly. If they are only able to spam once or twice, their spam, ironically, may stick around longer. If we do some sort of flood control, retroactively unpublishing previous posts or notifying the maintainers or something should go along with it.

WorldFallz’s picture

good point michelle-- i think we can add opening a ticket in the webmasters issue queue with a link to the user for review.

cweagans’s picture

Please don't write a tiny custom solution for Drupal.org. killes talked about getting spam.module deployed on Drupal.org, and a flood control filter would be pretty easy to write (each subsequent post within a given time period makes the post more likely to be spam).

WorldFallz’s picture

back to analysis paralysis, lol.

cweagans’s picture

Just trying to nudge people away from little custom things that are useful only to Drupal.org and towards improving modules that are useful to the entire community. The spam.module issue has been open for a long time, and would help with more than just flood control. Killes mentioned in #1 that he thought about it, and that was last September. I talked to him recently, and he said that he was sill on board with that plan. There's even a dev site already set up for it (http://spam-drupal.redesign.devdrupal.org)

Michelle’s picture

Doesn't need to be custom. Notifying the admin(s) when a user has been flood-blocked would be good for any site.

cweagans’s picture

Yes, but why wouldn't you build that against a module that provides intelligent filtering of one-off comments? Spam floods are easy to deal with right now. One off comments? Not so much. We don't know about them until they are reported, whereas with spam floods, it's pretty obvious.

To me, the bigger issue here is automating our entire spam fighting solution. That's already done with Spam.module, but the filters need configured/trained. Writing a flood control filter for spam.module is very very easy, and if we can get some support behind #1378456: Install Spam module on drupal.org, then I think we'll have a very effective solution.

Michelle’s picture

I didn't say you shouldn't. All I was saying is don't have flood protection going on silently because then we'll have harder time finding those first couple spams that get through before the flood production kicks in.

mgifford’s picture

Arg.. We gotta try something and learn from the experience. Ok, so:
http://drupal.org/node/1077602#comment-5823752

Spammers want to post links.. So why not just disable <a href="http://ANYTHING_NOT_DRUPAL.org"> for any account that is less than a month old and with less than 10 successful posts.

So ya.. I believe I'm with @WorldFallz here. Down with analysis paralysis. There is no perfect solution. But we've got lots of good ideas to try.

cweagans’s picture

This is not "analysis paralysis". This is a matter of "we already have a module that does 120% of what we need it to do, so let's freaking use it". killes has already signed off on it. All we need to do is get it deployed.

For initial deployment, it'll be mostly disabled so that we can train the filters. We need to leave it that way for ~1 month. After a month, the filters should be sufficiently trained, so at that point, we can turn it on and I'd guess that 95% of our spam problems will go away.

Michelle’s picture

Is there a reason we have two issues running? Could we mark this dupe of #1378456: Install Spam module on drupal.org ?

cweagans’s picture

Status: Active » Closed (duplicate)

Just reading through this issue again...

Re #42: We don't need flag abuse, because spam.module provides "mark as spam" and "mark as not spam" functionality.

There's lots of other stuff up there, but I think spam.module is a good first step to getting a good solution in place, so I'm going to mark this as a dupe of #1378456: Install Spam module on drupal.org.

kingandy’s picture

I disagree that this is a duplicate, that issue is a specific task ("Activate this module") whereas this one is a more general discussion about what spam methods are appropriate, desirable and feasible. Activating the Spam module is just one of the many suggestions that have been discussed.

Possibly this would be more appropriately located in the forums, but I'd hate to see all of the talk of flood prevention and keyword filters fall through the cracks just because one of the things brought up here has (finally) been actioned.

kingandy’s picture

Aside from anything else, as that issue was created as a direct result of the discussion here and references this issue, this issue is clearly older. Under standard practice, that should be marked as a duplicate of this.

If it was a duplicate. Which it's not.

dman’s picture

Status: Closed (duplicate) » Needs work

True.

Activate spam module.
+1 actionable point. Branched to its own issue. Cool.

One-off drive-by mini-spams :
= A different and thornier issue to detect and prevent. Discussion continues, though I dunno there will be an end

Some separate flood control thing - for these attacks that do not have links and are vandalizing this site several times daily.
= also needed. this is detectable on the above suggestions - and can be solved separately without getting derailed byt the single-case what-ifs.

shamio’s picture

Flood control can only prevent primitive spammers who are sending hundreds of posts in a short time. Also keyword detection is useful only on some cases. I know having these is better than nothing but the best possible way is always human work. For example i posted a report in this issue about a suspicious user (i think he is spammer) who registered about 17 weeks ago, with a few posts and still his account is not blocked because most moderators didn't review this report. My report is here: http://drupal.org/node/1503204

klonos’s picture

I agree with keeping this issue here open too for all the reasons mentioned in all previous comments. I think we should get spam.module deployed soon and perhaps that will addresses most of the issues we are concerned with here. For the rest of the problems/cases of spam we can perhaps improve that module to address them too since all the good suggestions here sound like nice additions for that module or a submodule of that project.

PS: ...plus the project name seems perfect namespace-wise for if it ever ends up as a security solution bundled with core. Though I'd prefer it to be "antispam" or something like that instead ;)

WorldFallz’s picture

@shamio-- blocked that user and deleted the spam. And no, the best possible way is not human work-- it's waste of valuable moderator time.

As someone who already had one account blocked for nonsense spam and now appears to be on a vendetta mission to make sure no one else gets away with what he got caught for, you're hardly in a position to tell us how to best handle spam.

Michelle’s picture

I won't argue with whether this is a duplicate but

Under standard practice, that should be marked as a duplicate of this.

is not true. We keep whichever issue is the most current, actionable, or has a patch on it, not necessarily the one that was filed first. I suggested this be marked dupe because the other issue is basically RTBC if killes has signed off and it's waiting on implementation and installing the Spam module makes this one moot until such a time as the Spam module proves itself unworthy.

If you folks want to keep talking about it, though, I'm certainly not going to stop you.

shamio’s picture

@WorldFallz, Can you please explain your second paragraph more? I didn't understand it at all.

dman’s picture

@Michelle
The spam module approach (AFAIK) closes a great many doors and is a big help. And it addresses a large part of the problem. That is very cool. But not a final solution alone.

I was *not* aware that it totally solves all the flood-posting issues that have been plaguing me me for too much of this year. Until a flood control system is also active, there is still a painful problem that needs solving.

So far, and in no small part due to this and related issues we have had, we've got good action actually happened on:

Yet we are still swamped by #1382008: Ongoing Vietnamese forum spam reports which none of these other worthy fixes can actually deal with at all.

This is a remaining task.

The edge cases I don't think we'll ever be able to deal with fully, so lets try to ignore theoretical cases of a genuine user who posts 10 real comments in a minute. (ORLY?)

We are under siege by a bot that post 100 new threads in 20 minutes. We should be able to sanely prevent that.
This can be fixed by the brainpower present in this room right now.

Michelle’s picture

@dman: Could you clarify why you think the Spam module is incapable of stopping the Vietnamese spam?

#558858: Deploy fasttoggle to drupal.org for user blocking and node unpublishing is the issue you're looking for.

dman’s picture

I would be happy to be wrong, but none of the suggestions I've seen endorsing spam.module, and nothing in my review of spam.module docs indicate that it can detect a large number of (technically unique) posts that individually trigger no spam warnings and have no blacklist keys or links in them - but are spam by virtue of the fact that the same short-term user is posting an unpleasant number of items at an inhuman rate.

if spam.module does effectively detect this the I'd be happy to hear it.
Nobody has said that it does this before now, so it's news to me.

Thanks for the link to the other issue. That one seems to have gone particularly smoothly. And only took 3 years to go live :-)

Michelle’s picture

Based on https://drupal.org/node/1378456#comment-5822050 I thought it was possible. I don't actually use the module. It seems to be pretty customizable, though.

Anyway, my point was that it makes sense (to me, anyway) to wait until the already agreed on module is deployed to see what, if anything, still needs to be done rather than discussing alternatives here while they are deploying over there. It just seemed like a duplicated effort to me but that's just my opinion and I'm not going to push for it if you think the discussion is still worthwhile.

cweagans’s picture

Status: Needs work » Postponed

Spam.module does not currently flag floods of things as spam, but it would not be difficult to do. The duplicate content filter in the spam.module package would be a good example to follow for anyone interested in making that happen.

As I said before, spam.module will do everything that we need it to do. If you don't like duplicate, then let's mark this as postponed and continue discussion after we've used spam.module for a little while. That will allow us to have a more coherent conversation about what else needs to happen. Until we've been using spam.module for a while, I don't think anybody can really say what else needs to happen.

In the short term, 100% of the effort being put into this issue needs to be directed at the spam.module deployment issue because that will solve most of our problems. Before anybody else makes a comment about what spam.module can/cannot do, please read and understand this page: http://drupal.org/node/498092

Spam.module already does 120% of what has been suggested here already: keyword analysis, checking against a blacklist of URLs, regular expression searches on content, everything. We may not even need flood protection if we choose to enable the SURBL filter.

webchick’s picture

So instead of arguing back and forth on this, who would like to volunteer to help silverwing test out Spam module (and whatever other counter-measures) on http://spam-drupal.redesign.devdrupal.org/ ? I'm happy to fork out SSH access to stagingvm. Please follow instructions at http://drupal.org/node/1018084.

silverwing’s picture

WorldFallz’s picture

I'm in-- heading over to the new issue and testing now!

laura s’s picture

Spam.module already does 120% of what has been suggested here already: keyword analysis, checking against a blacklist of URLs, regular expression searches on content, everything. We may not even need flood protection if we choose to enable the SURBL filter.

Let's not forget the Bayesian filter that learns. Most spam is like other spam, so this can really help. It can also help in flagging abuse, as much abuse exhibits similar text patterns. It's a wonder on community sites.

The real question is what to do about the long approval queue that is likely to accumulate. Spam module would make blocking spam comments easier, but we'll still want to be reviewing the posts for legitimate comments that got caught.

Michelle’s picture

I could probably take care of quite a bit of it as long as there is easy to use tools. I do that sort of stuff when I'm too tired for anything productive and need something mindless yet more useful than playing freecell. :)

dman’s picture

Just a bump.
Our Vietnamese fellow slowed down for a while this month (or other moderators picked up on it faster), but is still at it again.

Reporting back from #1515386: Testing our antispam measure doesn't look like a total win.
Anyone know what the next roadmap should be?

WorldFallz’s picture

What about trying something simple and unobtrusive like http://drupal.org/project/spamicide?

cweagans’s picture

Spamicide is an interesting solution. Let's try it. Is there anything on the spam dev site that we need to keep? If not, we can ask for it to be rebuilt (so that we're starting with a fresh copy), and then play with spamicide.

cweagans’s picture

Status: Postponed » Active

Also, since spam module is a no-go, reopening.

dman’s picture

Bump - because I've just spent my entire evening playing whack-a-mole with multiple hydra-like attacks on d.o
As well as our persistant Vietnamese prick (thrice this evening) there is a dozen of other flavours pushing Nikes, Straming sports, Gucci bags, and a number of other things in languages I can't be bothered parsing, and even a few perennial "SEO" merchant track-backs.

The last three hours of my volunteer time has been spent not-helping-people but bastard-bashing.

Please make it stop.

(forgive my frustrated tone)

s.Daniel’s picture

@dman: Sounds like human spammers?

I really like the idea of Spamicide and alltough it might noch help a lot I think it is worth a try because even if we only catch 10% of the spam that is 10% less time waisted. I don't see a reason not to give it a shot.

drupalshrek’s picture

It seems to me that given how many thousands of programmers are involved in the Drupal project, so far it's pretty poor the ability of the Drupal website to deal ultra-efficiently with spam. Currently, almost every day when I come to the forums there is spam, e.g. the following are a sample of issues I have raised recently:
http://drupal.org/node/1594188
http://drupal.org/node/1594182
http://drupal.org/node/1582752

Now, let me say the webmaster team seem to do a great job of removing spam posts very quickly, so no complaints there. However, the need to fill in an issue request really is a pain when it is so frequent.

Adding simply a "flag as spam" button doesn't really help much, since the webmasters then have to watch that.

I would suggest however:
a) Each post has a "flag as spam" button available to it (which I suppose can be done with the Flag module)

b) A rule is set up (using the Rules module) and perhaps another custom module (?) with the following sort of logic:

* If 1 person flags a post as spam, this is simply listed somewhere as possible spam for review by someone
* If 2 (or some other threshold of) different people flag a post as spam, this post is immediately and automatically unpublished (i.e. made invisible to all but admins) and made subject to further scrutiny
* if a person reaches a particular spam threshold (e.g. the number of posts which they have posted which are flagged as spam exceeds the number of posts which have not been flagged as spam), then their account is automatically blocked, automatically listed on a "spammers for review" list, and (optionally) all their posts automatically unpublished, and they are automatically sent a mail saying what has happened and why and the appeal procedure.

The detail of this is just a proposal, but any system should be:
* extremely easy for users to flag spam
* extremely efficient at dealing with spam so that the admins have as little to review as possible
* as automatic as possible
* relatively easy to implement

I think what I propose ticks all those boxes.

s.Daniel’s picture

@drupalshrek: Your idea sounds good. Some things to consider:
* performance issues we would currently face: http://drupal.org/node/1293186#comment-5500578 more details http://drupal.org/node/1293186#comment-5735812
* The described system could be abused to silent controversy discussions.
* Related post http://drupal.org/node/1293186#comment-5735812 http://drupal.org/node/1293186#comment-5063582

Anyways this'll be the 121th comment and I don't think we'll find a perfect solution but maybe we can agree on one that saves us time - it'll definetly be better than the current one. So maybe we should open a new thread figuring out the best flag/rules based solution. Personally I wouldn't have a problem with mollom either if we can configure it to work good for us with little false positives. We don't code our own image editing tools to create pictures for our websites most of us use photoshop, doesn't bother me either. I believe mollom can be turned off for certain roles. So mollom + rules (e.g. 1y member).

mcfilms’s picture

Before we open a third (or fifth) thread about the spam issue, can I make some observations:

I don't think the simple flagging system suggested by drupalshrek in #121 will work. The pesky Vietnamese spammer that has been hitting D.org a lot this year is creating many nodes every minute. Flagging single nodes or a comments won't work, but flagging the user would.

I also think there are a number of reasons everyone is hesitant about the spam.module.

Keep in mind that spam processing adds slightly to the load on your server (in speed and database space)

I also see in this thread http://drupal.org/node/1515386 that spam.module was delivering false positives and that there is not a D7 version available.

However, I have taken Webchick's advice in #110, gone to http://spam-drupal.redesign.devdrupal.org/ and learned that I do not have access. So following the directions at http://drupal.org/node/1018084 I have requested a drupal.org development site. I think everyone who has an interest in getting rid of the spam problem should at least test the proposed solutions.

Also, is there a method to create an avalanche of spammy post to test any possible solutions?

WorldFallz’s picture

It's really not very helpful to parachute into an issue, state the obvious (or worse, restate what's already been discussed) without reading it first. We're not short on ideas of how to handle spam. We're short on effort to test the ideas and find an acceptable and performant solution for drupal.org.

afaik, rules on drupal.org is a no-go and flag has performance issues.

we'd love additional help on this -- but please, do everyone participating in the thread the respect of reading it before adding to it.

cweagans’s picture

Looks like killes is going to re-test #1515386: Testing our antispam measure (spam.module).

Sure, there's no D7 port, but it shouldn't be that hard to port.

dman’s picture

I've been another three hours tonight, neverending story.
Stamped another 25-30 different accounts just from today (and a few that got caught in the dragnet based on keyword searches I did to make sure I wasn't fighting a hydra)
Zapped another ~500 individual posts and comments. in batches of 4-10. It takes time. Normally when faced with a task like this I automate it away. How come we can't do that?

From this batch

80% of them could have been fully prevented by a small keyword blacklist

"Nike", "Air Max", "Champions league", "Gucci", "Ugg", and a few more I could collect easily if there were somewhere to use the list and I could have my evening of volunteer work back.
If we force them into 'greeking' their keywords then there is zero SEO point.

A similar number could have been put on probation immediately based just on their chosen email service. When more than half the spam accounts I hit come from one ISP - I seriously would block that ISP and lose no sleep over the chance of false positives prohibiting an individual who chooses to hang with the wrong crowd.
If someone did a search of the d.o DB and compared email suffixes with blocked accounts - there will be a couple of glaring repeat offenders sticking out by orders of magnitude.

Or I could have actually worked on good stuff, earned $500, and paid a Vietnamese thug to track down whoever keeps thinking this is a clever place to advertise washing machines and hurt them. A lot.

I'm Grumpy again. Can you tell?

webchick’s picture

dman, why don't you apply for access to the sandbox silverwing started and work on deploying the tools you need? See #110.

dman’s picture

Yeah. Thanks for the constructive suggestions. I should put up or shut up.
Getting hands-on with the whole of d.o looked a little intimidating, but that set of instructions looks really well put-together. I'll give that a go.

I've just been bumping this to keep it on the radar I guess. And venting of course :-?
I'll see if I can take some more positive action...

drupalshrek’s picture

I'd like to implement a sandbox demo of what I described (#121).

However, before trying to implement a solution which is not going to be acceptable, I'd like to know the constraints. WorldFallz says "afaik rules on drupal.org is a no-go and flag has performance issues".

Is rules a no-go? If so, why? If it's performance issues, why can't that be fixed?

Does flag have performance issues? Are there any reasons why the performance issues can't be fixed?

I'm ready to do the required development, but I don't want to reinvent the wheel, if existing wheels just need a little straightening.

@dman, you are the man! Well done for all your manual anti-spam effort!!
@s.Daniel, thanks for your encouragement.
@silverwing, count me in on your sandbox effort against this spam. Please contact me directly to tell me how I can help.
@WorldFallz (#124), please don't be such a discouraging misery. I understand it's annoying if someone has not read and absorbed every single post in the thread, but when a thread gets this long that's inevitable.

cweagans’s picture

I understand it's annoying if someone has not read and absorbed every single post in the thread, but when a thread gets this long that's inevitable.

That's what issue summaries are for.

Is rules a no-go? If so, why? If it's performance issues, why can't that be fixed?

I'm not sure what the reasons were, but it was proposed for another component of Drupal.org and it was decided that it would introduce a lot of overhead/performance problems/etc. We really don't need it, and I think killes said no, so you'd have to convince him.

IMO, we should try the spamicide module. It's a quick, easy solution that might really cut down on the amount of bot spam. From there, we can layer on other protections.

Somebody should also get in touch with killes - he is currently retesting the spam module, because that is his preferred solution.

WorldFallz’s picture

i think part of the problem is, that I don't think there's a simple real world way to test these under a drupal.org load. Other than installing the modules and verifying functionality, how can we actually test performance implications and actual spam prevention effectiveness? As dman mentioned, do we have to write our own spambot?

if it were my site, I'd try spamicide and a simple keyword blacklist as suggested above. we're somewhat lucky in that most of the keywords that would indicate spam aren't likely to appear in legit drupal.org posts.

However, killes prefers spam, so I'm not sure what we can do until he makes a determination if that's acceptable.

killes@www.drop.org’s picture

We cannot really load test it, and we IMO don't need to. We can always switch it off again, if it causes a problem.

What we should do is to make sure we keep additional SQL queries to a minimum and check those that we send to identify missing indices etc.

sun’s picture

FWIW, Mollom** would be happy to help, and I'd personally offer to do whatever is necessary to make a deployment on drupal.org possible; including the addition of features (such as a potential option to exclude longer-term users from its checks, or adding a "true" dry-run mode).

The module already supports an option to not block any posts that have been classified as spam. The CAPTCHA that is shown by default when there is no clear/confident classification result in either ham/spam direction can already be disabled, too.

It is also worth to mention that the Mollom module has been battle-tested on many other large-scale sites.

Personally, I understand the open-source argument, but I do not think that the aspect plays a significant role. The module for Drupal is GPL. The backend/web service is not, but that is also where the actual content classifier and machine learning intelligence lives, which is very advanced and developed/improved on its own every single day. :)

Again, happy to help. Just let me know.

** Disclaimer: I work for Mollom. (Due to that, I'll also leave it at this offer for now.)

cweagans’s picture

I think the bigger issue is that we don't like giving data to 3rd parties. Mollom might be a good fit, but we can't even use Gravatar because of the possibility of them tracking Drupal.org user activity.

mgifford’s picture

I've had reasonable experiences with Mollom. I do understand why folks would want to not go with it. However, we've been discussing this issue now for over 10 months.

@sun - any chance we could have a custom disclosure agreement between the Drupal Association & Mollom to ensure that concerns about tracking user activity are addressed?

cweagans’s picture

The reason we've been discussing it for over 10 months is that it's a difficult problem to solve within the parameters that we have to work within on Drupal.org. I am working on an automated spam solution for Drupal.org. We've already waited 10 months. What's another week or two?

mgifford’s picture

That's great to hear @cweagans. Didn't realize it was that close. I'm willing to do some testing in mid August, but hopefully it's implemented by then.

cweagans’s picture

Update:

I've built a feature on my local dev machine that I'll deploy to the spam dev site tomorrow. It's essentially just a bunch of strongarm'ed variables. I'm pretty sure that this isn't going to be an accepted deployment strategy, since strongarm isn't currently installed on Drupal.org (and there's probably a reason for that). It makes no difference, though, as it's easily converted to hook_enable/hook_disable in a custom module for d.o.

I've also built a whitelist for the URL filter: #1690134: Allow the URL filter to check against a whitelist. This needs reviewed, but essentially, it allows an admin to specify domains that should not count against the URL limiter. This is handy, for instance, in the issue queue where we're linking to Drupal.org all over the place, as well as drupalcode.org and devdrupal.org.

I've also built report_spam.module. This will allow us to crowdsource comment and node moderation. Note that this provides slightly different functionality than the "mark as spam" or "mark as not spam" buttons that are built into the spam module. This allows any user with the "mark content as spam" permission to flag something as spam. This increments the spam score on the content by some specified interval (each role on the site can have a weight associated with it. For instance, normal "authenticated users" might have a weight of 5, and "site maintainers" might have a weight of 100). The module will use the highest weight available to the user.

When the user clicks the "report spam" button, the spamminess of the content is incremented by the current user's highest weight. If enough spamminess is accumulated (as determined by the spam.module configuration - by default, the threshold is a spamminess of 86), then the content is automatically unpublished. Users with the administer spam permission can, as always, just mark content as spam and it'll be automatically unpublished (or mark it as not spam, which will reset the spamminess to 1). Oh, and the user can only report a piece of content once (so that we don't get people that just keep clicking the button until the content is unpublished)

Next, I plan to fix #1524656: When blocking node, user get's /spam/denied page. I'm also going to work on getting the roles on the spam dev site configured with new permissions. I'm also going to write a new filter for the spam module for "inactive threads". Basically, spam.module comes with a node age filter, but the only thing it checks for is that a comment is being added to an old node. It doesn't matter that the latest comment was only a couple days ago. The issue for this is #894730: Node age should be based on last comment timestamp, not node timestamp, so if somebody wants to tackle that, just let me know (either via my contact form or IRC).

I also need to write a flood control filter that will catch floods like our vietnamese "friend" keeps unleashing. The flood control filter should be a last resort: other filters should catch it first (especially the SURBL filter and the bayesian filter).

If you want to help with this, please please please please review this patch: #1014114-3: Test the spam administration permission... could we have two levels? And if it's been sufficiently reviewed, nudge the maintainers to commit it and get it into a release so that it's easily deployed.

tl;dr:

Done:

  • Crowdsourced spam reporting and unpublishing with admin features
  • URL filter whitelist
  • Basic spam.module configuration

Todo:

  • Deploy all the things to the dev site
  • #1524656: When blocking node, user get's /spam/denied page
  • Configure dev site roles
  • Write thread age filter
  • Decide on what roles should be exempt from spam filtering
  • Write an admin UI for report_spam.module (currently, all role weights are configured in the variables table - I've just been setting them manually, but we really need an admin UI before this goes live)
  • Write a flood control filter

How you can help:

Please let me know ASAP if you have any questions, concerns, comments, etc.

cweagans’s picture

Assigned: Unassigned » cweagans

Better grab this, too

cweagans’s picture

Oh, and report_spam automatically integrates with the bayesian filter, since it uses spam_mark_as_spam(). Filters will be trained :)

This issue needs reviewed to make sure that works correctly: #1690686: Only fire mark_as_spam and mark_as_not_spam spamapi calls when appropriate

mcfilms’s picture

The issue I see with this in the real world is that the spammers that hit drupal.org upload dozens of duplicated (or very similar) nodes. So if I report post #26 as spam and you report #47, how will this unpublish all the content the spammer has published?

cweagans’s picture

That would be the flood control filter that I need to write....which is something that I totally forgot to add to my summary above. In that case, the flood control filter will catch subsequent posts and immediately flag them as spam, which will further train the bayesian filter, which will help to prevent that kind of post in the future. When spam is flagged like that, it also goes into a queue, so we'll have a really good idea of who to ban from Drupal.org. There's still a bit of manual intervention in there that I haven't figured out how to get around, but I think what I've planned thus far is a good stepping stone.

Edited my comment to include a flood control filter on the todo list.

Dries’s picture

Personally, it doesn't seem a good use of our time to build, maintain and upgrade our own comment spam solution. Spam filtering looks easy, but it is actually a very difficult problem that requires constant attention and tweaking.

Also, it is already very difficult and slow to make improvements to drupal.org so I'm not sure adding more modules is a good idea.

Mollom is a 3rd party service, but we have a privacy policy (http://mollom.com/web-service-privacy-policy/) that bounds Mollom to the strict rules of European data protection. I'm sure other third party solutions have privacy policies too. They could be reviewed and compared to see if one of them is acceptable. (Disclaimer: I'm the founder and general manager of Mollom, a 3rd party anti-spam service for websites.)

I'm happy to redirect Mollom resources to implement Mollom module feature requests from the Drupal.org webmaster team. If Drupal.org adopts Mollom, we'd do whatever is necessary to help and make Drupal.org spam free. We're building some pretty amazing moderation tools that could really benefit the community.

Given my conflict of interest, I don't want to be part of the decision making. Regardless of what we choose to do, the reason it takes 10 months because we don't have effective decision making. It is something we need to address -- but not in this issue.

Having said all this, I applaud cweagans' initiative to take this on. Thanks for caring so much for making drupal.org a better place! :) Just wanted to share my 2 cents.

laura s’s picture

Personally, it doesn't seem a good use of our time to build, maintain and upgrade our own comment spam solution. Spam filtering looks easy, but it is actually a very difficult problem that requires constant attention and tweaking.

I agree. We're talking about upgrading an existing solution that was aces for earlier versions of Drupal. I see an upgrade of the Spam module to be a huge win for the community. In my experience the Bayesian-driven system a great spam management solution, the best I've worked with. This is something that would be a real value to contrib. As an open source community, my preference is for the open source solution that can benefit other Drupal sites out there.

cweagans’s picture

I'm not really building it. I'm just configuring it and putting together some glue code. In any case, Mollom has been shot down for whatever reason, so we have to make due. If somebody wants to challenge that decision, be my guest - it'd be much easier if we didn't have to maintain spam module for Drupal.org - but I don't have the energy nor desire to challenge killes' decision on that one.

Ayesh’s picture

@cweagans - great job starting with the report spam module. Looks like a nice one (i tested it a few hours ago).
My personal thoughts about Mollom is that it is not a very good idea for d.o. I use it on most of my blog sites and it's an awesome magical product. But d.o is the center of Drupal so I think we must be curious to show a message that says their content is going to be filtered with Mollom.

Mollom is a wide range solution that can filter low quality comments as well. In most d.o forum posts, where most of the spams go, there are , many non constructive questions and many questions that links to their web sites. They are clearly not spam (in advertising scope). But when a bad guy attacks, most of the human users can see that it's a spam in first sight. They start with some chinese characters, numbers, etc.

In most times, a user registers, posts hundreds of posts in few minutes, and leaves.
I'm not sure if Mollom takes the user's age (since registering in d.o) and frequency to account. That's where Spam module can shines. Plus, the human users can report them. as far as I know, Mollom doesn't have a way to allow regular users to report a post as spam. Again, that's something Spam module can shine.

When a bad guy attacks, he posts several posts at once. Most of them don't get any comment or even considered. But I don't think a bunch of people will report all of these posts as spam to be unpublished automatically (everyone should report , by adding 5 points, to reach 86). This is going to be a big effort than WorldFallz or any other moderator hunts the post and user.

Improvements the Spam module needs, I think, is some sort of self decision taking system to check user activity secretly.
If a single post gets a reported as spam, what if Spam module checks user age and other posts by that user ? Most of the spam posts has same text in it, so a single reporting from a user plus spam module's intelligence (user age, other similar posts, interval between posts) should be able to do a magical job.

However, this could be a huge weight to drupal.org. It gets several hundreds of legit posts so I'm not sure if it would be a reliable way to hunt spam.

Sorry if my post sounds stupid. I had to develop such small module (not using Spam module) to check user age and posts interval for a small community site and it was fairly good to hunt most zombies.

cweagans’s picture

If a single post gets a reported as spam, what if Spam module checks user age and other posts by that user ? Most of the spam posts has same text in it, so a single reporting from a user plus spam module's intelligence (user age, other similar posts, interval between posts) should be able to do a magical job.

Ideally, the bayesian filter will be smart enough (after being trained for some time) to catch these sorts of posts before they happen. Also, we have a lot of roles on Drupal.org, and pretty much all of them should be weighted more heavily than the default in report_spam. In fact, site maintainers should have a weight of 100, along with a couple other roles. Spam reporting is just a training mechanism for the filter, which has the side effect of unpublishing spammy content.

I might also write a cron task that auto-bans people every night if they've posted too many spammy posts. We can set this threshold pretty high to prevent false positives. It should be pretty easy to get a list of un-banned users that have posted n or more spammy posts.

cweagans’s picture

Issue tags: +Spam hitlist

Tagging.

cweagans’s picture

For anyone following this issue, the hitlist is here.

cweagans’s picture

I've thought about it a bit and chatted with killes about it:

I really think we should use Mollom, if we can: #1694494: Install Mollom on Drupal.org.

If we can't, then I guess I'll just continue on the Spam hitlist, but I'm not doing anything until we make a final decision on Mollom.

killes@www.drop.org’s picture

Just for the record (I've been asked): I still don't think mollom is an acceptable solution for drupal.org (mainly since its closed source).

mototribe’s picture

have you thought about using http://drupal.org/project/spambot?
I had lots of spam on one of my sites and after installing spambot I hardly get any spammers anymore. It interfaces with the http://www.stopforumspam.com/ site which is a crowd sourced list of spam IPs and email addresses.
If d.o. would participate it would benefit other sites because we could report new spammers to that site.

hass’s picture

cweagans’s picture

Done, but in the future, this is not the issue to report spammers :)

hass’s picture

It was hard to open a case in this queue with iPhone... :-)

How about the status of the spambot/Mollom?

geerlingguy’s picture

Posting for reference/subscribing: #1759272: Test honeypot module on http://drupal.org.

hass’s picture

Aside of this amazing honeypot results, is there any good reason why the registration form is not using a captcha? They always helped me a lot...

cweagans’s picture

I think we decided early on against using a captcha because it would inconvenience legitimate users and only serve as a minor deterrent for spammers. I'd be okay showing a captcha to a user only if we had a reasonable degree of doubt that their post was legitimate. That is, if we're unsure if it's spam or not, show a captcha, but don't show it in any other circumstance.

kingandy’s picture

hass’s picture

He (falcon03) suggested himself to use a text captcha... I only talked about registration form... Not every post. A math question may just block the automated account creation.

greggles’s picture

On the idea of a math captcha:

http://drupal.org/node/532186#comment-1857676

We used to use that style of CAPTCHA but got a lot more spam. In terms of annoyance to end users and consistency of blocking spam Mollom is a vast improvement (based on my experience as one of the major spam-blockers and admins on the site). So, I don't see that as a solution.

batigolix’s picture

I'm monitoring the Documentation issue queue and the majority of incoming issues are reporting spam. A spam filter would allow me to spend more time on real issues.

killes@www.drop.org’s picture

Issue tags: -Needs issue summary update, -Spam hitlist

@batigolix We have now a spam deterrent on drupal.org.

Could you for the time being unpublish spam rather than delete it? I can then check it and tune the filter.

batigolix’s picture

Will do.

Is this the chosen spam deterrent?: #1759272: Test honeypot module on http://drupal.org

mcfilms’s picture

Maybe it is best not to share that info publicly. I'd suggest a PM.

killes@www.drop.org’s picture

Status: Active » Fixed

marking as fixed.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

anarcat’s picture

Could it be clarified, for the sake of the rest of the Drupal community, which solutions are implemented on Drupal.org to combat spam? It seems to me honeypot is hardly a complete solution, and I would be surprised if it was the only tool deployed here...

Thanks.

killes@www.drop.org’s picture

We use honeypot and a specialized drupalorg_honeypot module which is part of https://drupal.org/project/drupalorg

We are currently waiting for #1776878: Allow toggling of roles (6.x backport) to make the process of giving users the "not a spammer" role a bit smoother. If you want to help, you are welcome!

cweagans’s picture

hass’s picture

I would also love to see a button/link Mark as spam. Opening cases in webmaster queue is really painful.

klonos’s picture

greggles’s picture

We've added a "mark as spam" on g.d.o that works pretty well. I don't think now is the time to add that to d.o but I believe that Commons for D7 has a similar (better) version that could be considered for d.o.

klonos’s picture

I don't think now is the time to add that to d.o...

Meaning what? What would be the right time? Do we have any follow up issue? The only one I'm aware of is/was #226678: Add a "Report spam/abuse" link to forum/issue comments (next to the "edit" & "reply" links).

greggles’s picture

* Most effort on d.o is currently focused on the upgrade to d7 and I don't want to detract from that
* As far as I can tell, honeypot works pretty well (and honeypot+mollom for g.d.o)
* I think the highest priority spam-fighting-efficiency-improvement is #1776878: Allow toggling of roles (6.x backport)

DocRPP’s picture

+1 for BOTCHA. Using it with a lot of success. I am getting 100 percent success.

cweagans’s picture

Sorry for necroposting, but I thought I'd make everyone aware of https://drupal.org/project/spam_detect

It's very similar to the spam module in Drupal 6, but doesn't include a lot of UI features from the 6.x version. It's just a generic spam filtering API. The UI part can be built however we see fit within the drupalorg module.