Testing our antispam measure [#1515386]

http://spam-drupal.redesign.devdrupal.org/
spamtest / spamtest <- site maintainer account
bagel / bagel <- authenticated user account

Administration page: http://spam-drupal.redesign.devdrupal.org/admin/settings/spam
Content page: http://spam-drupal.redesign.devdrupal.org/admin/content/spam

Some problems:
1. Permissions are lacking. Authenticated users don't get a 'Mark as spam' link unless that role has the "Administer spam" permission - but that gives them access to change the filters (which isn't cool.) If there's a way around this, I'd like to know!

1.5 Therefore, we may need that "Report as spam" link, anyway.

2. The Bayesian filter may add too many tables to the database - but this might be a 'wait and see' how big the table gets.

3. Testing spam on a test site is somewhat futile, since it doesn't get the 50-500 spam posts d.o gets, not to mention legitimate posts.

My philosophy on all this is that we'd rather have real spam get through the filters than real content get blocked.

So let's test the Bayesian filter by marking comments/nodes as spam and come up with some common-sense defaults.

Deployment Notes:
Make sure "allow spam comments to be posted, automatically unpublish, and notify user" is selected at /admin/settings/spam otherwise comment isn't saved.

Comment	File	Size	Author
#1	spam_screencap.png	41.63 KB	WorldFallz

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Comment #1

WorldFallz CreditAttribution: WorldFallz commented 4 April 2012 at 20:17

File	Size
spam_screencap.png	41.63 KB

I'm not sure how much valid testing we can do on a test also, but I agree with your philosophy erring on the side of the user is better. However, our main spam problem is not really the occasional human created spam that might also suck in legit comment. The main issue, and the reason this whole issue surfaced again for real, is mass spam from spam bots.

In any case, the first thing I encountered was the following when marking an item as spam using the spamtest account:

spam screencap

imo, the user marking spam should not receive this message.

Also, the fact that only spam admins can mark spam seems somewhat counter intuitive and rather than adding our own custom 'mark as spam' link, the module should be adjusted to separate out this permission from 'administer spam'. I haven't had a chance to check the spam module's issue queue yet, but maybe we can submit a patch if one hasn't been submitted already.

Comment #2

WorldFallz CreditAttribution: WorldFallz commented 4 April 2012 at 20:19

maybe we can test the patch in #1014114: Test the spam administration permission... could we have two levels??

Comment #3

WorldFallz CreditAttribution: WorldFallz commented 4 April 2012 at 20:28

another thought-- we'd want vbo actions for 'mark as spam' and 'unmark spam' so we could add them to our admin-nodes and admin-comments views (nothing shows up admin/content/content).

Comment #4

silverwing CreditAttribution: silverwing commented 5 April 2012 at 19:34

Thanks for the feedback, WorldFallz!

I should mention that we're testing the 6.x-1.4 version of spam.module.

@#1 - I didn't get this problem and can't reproduce it. When I marked spam as either the spamtest account or my silverwing account I'm always redirected to the node/comment with it being unpublished. When I added spam as bagel I get that screen.

@#2 - I'm of two minds here - it would be great to have, but do we wait? I'm thinking we'd rather deploy 'now', and test the patch for a 1.5 release (or make sure it's in the D7 version.

@#3 - VBO - after deployment (though I just tested it and I'm getting errors "An HTTP error 403 occurred. /batch?id=6651&op=do" and "An error occurred while processing _views_bulk_operations_execute_single with arguments: _views_bulk_operations_execute_single" - but it was my first attempt :) )

Comment #5

WorldFallz CreditAttribution: WorldFallz commented 5 April 2012 at 20:29

@4-1: weird. I keep getting it. i get http://spam-drupal.redesign.devdrupal.org/spam/denied as the url with the above image. Unmarking spam works fine. Maybe it matter which posts? Can you try http://spam-drupal.redesign.devdrupal.org/node/1412168?

@4-2: imo this is pretty important. We don't really need webmasters to have a 'mark as spam' link, but users. You're aware of the issue with using flag for this, and I'm afraid after we deployed it would go into the black hole of never getting done. I think we stand a better chance of getting users the ability to mark spam if we require this patch for deployment.

@4-3: can definitely be post-deployment.

Comment #6

silverwing CreditAttribution: silverwing commented 9 April 2012 at 18:23

I was able to recreate the bug! I'm not sure why it's happening, though. I'll create an issue and see if Jeremy has any idea.
#1524656: When blocking node, user get's /spam/denied page

Comment #7

silverwing CreditAttribution: silverwing commented 9 April 2012 at 18:36

regarding the patch in #2 - on my local site I got it to show the link to everyone, but when clicked by an authenticated user, it give an "Access denied" page.

Plus, as far as I understand, spam.module will unpublish posts that are marked as spam, and I don't want to give that much power to everybody. (Also, I can't see any ideal way to track who marks/unpublished content, and that would be a must.)

Comment #8

WorldFallz CreditAttribution: WorldFallz commented 9 April 2012 at 19:18

hmmm.... good point. I guess we can always just create a 'report spam' link that creates a webmaster issue completely separate from the spam module (since using flag for this any time soon seems unlikely), but I would have thought this type of functionality would be a common feature and part of the spam module.

without this, I'm a little a on the fence about deploying the spam module at all since, although the filters and other features are nice, it doesn't do the two most important and simple things we need most as part of our initial anti-spam changes: 1) automate spam reporting for users (as opposed to site maintainers) and 2) flood control.

Obviously spam module can do those things if the module maintainer accepts the features, but do we really want to add an entire module like the spam module for 2 simple functions we're going to have to write anyway.

thoughts?

Comment #9

AlexisWilke CreditAttribution: AlexisWilke commented 23 April 2012 at 05:27

Hi guys,

A quick set of answers:

1. Permissions are lacking. Authenticated users don't get a 'Mark as spam' link unless that role has the "Administer spam" permission - but that gives them access to change the filters (which isn't cool.) If there's a way around this, I'd like to know!

I have a patch proposed for that purpose. #1014114: Test the spam administration permission... could we have two levels?

1.5 Therefore, we may need that "Report as spam" link, anyway.

See (1).

2. The Bayesian filter may add too many tables to the database - but this might be a 'wait and see' how big the table gets.

I suppose you meant "too many words to the database table". The Bayesian filter does not add new tables as it grows. Also, older words (unused for a while) get removed from the table.

3. Testing spam on a test site is somewhat futile, since it doesn't get the 50-500 spam posts d.o gets, not to mention legitimate posts.

Yeah. I agree that it is difficult to test the spam filtering mechanism...

----

Personal insight: at this point no one started work on a version for D7, is that a problem for you guys?

Thank you.
Alexis Wilke

Comment #10

silverwing CreditAttribution: silverwing commented 23 April 2012 at 21:41

@Alexis

As it stands, I'm getting a bit fed up with this module - it doesn't do a lot of what we need (authenticated user reporting) and I'm getting "This post is spam" when I don't think it should be giving me that error as Administrator, and editing book page nodes with a lot of links (as a lot of our pages have) also gets marked as spam. All these together make it difficult for me to continue debugging for d.o.

I'm considering spam.module a "no-go".

Comment #11

WorldFallz CreditAttribution: WorldFallz commented 23 April 2012 at 22:35

yeah-- i commented above that i was on the fence. but I think i've come off the fence since and am now firmly in the 'we should find another way' camp.

Comment #12

cweagans

He/Him

English

Boise, ID, USA

CreditAttribution: cweagans commented 28 April 2012 at 20:07

Status:

Active

» Closed (won't fix)

I suppose this is the appropriate status, then.

Comment #13

killes@www.drop.org CreditAttribution: killes@www.drop.org commented 21 May 2012 at 09:46

Assigned:	Unassigned	» killes@www.drop.org
Status:	Closed (won't fix)	» Needs work

spam.module is still my favourite choice for antispam measures. When I tested it for a client, I didn't have the problems that silverwing encountered.

I guess I need to do some testing myself with the mentioned patches applied.

Comment #14

dman CreditAttribution: dman commented 24 May 2012 at 07:11

I'm in the backend of http://spam-drupal.redesign.devdrupal.org/ now, per #1596432: I want access to the drupal.org development site for testing spam prevention issues (dman)

Not sure what next. Do we have to write our own spambot to hammer the thing to test?

Comment #15

cweagans

He/Him

English

Boise, ID, USA

CreditAttribution: cweagans commented 18 July 2012 at 06:27

I've been working on this. See #1293186-138: Spam - meta: better spam-combating suggestions.

I guess we can always just create a 'report spam' link that creates a webmaster issue completely separate from the spam module (since using flag for this any time soon seems unlikely), but I would have thought this type of functionality would be a common feature and part of the spam module.

I've applied the split permission patch, and developed report_spam.module on top of it (which is a bit more sane way for our users to flag content as spam) for this purpose. Based on the user role, report_spam will increment the spamminess score of a piece of content. When it passes the threshold, it gets unpublished. When it gets unpublished, it also trains the bayesian filter.

I'm getting "This post is spam" when I don't think it should be giving me that error as Administrator

This sounds like a bug, but there is a "bypass filters" permission that might need to be applied. If you're logged in as Dries, though, then I'm not sure what's going on.

editing book page nodes with a lot of links (as a lot of our pages have) also gets marked as spam.

I built a domain whitelist for the URL filter for this purpose. That way, we can specify that any link at drupal.org, devdrupal.org, or drupalcode.org is safe and we don't need to count it against the user.

without this, I'm a little a on the fence about deploying the spam module at all since, although the filters and other features are nice, it doesn't do the two most important and simple things we need most as part of our initial anti-spam changes: 1) automate spam reporting for users (as opposed to site maintainers) and 2) flood control.

Number 1 is handled by report_spam.module, and Number 2 is partially handled by the duplicate content filter. I'll do it the real way and contribute a flood control filter to spam.module, though. That should be in the next couple of days - should be pretty easy to write something that just provides a control for x posts per n minutes per user threshold.