Closed (cannot reproduce)
Project:
Spam
Version:
6.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
19 Dec 2008 at 03:50 UTC
Updated:
27 Dec 2010 at 00:54 UTC
Whenever the duplicate content filter is enabled, there is a conflict with users who add a signature to the end of all their posts.
The more the users post, the higher their score is on all of their posts. This gets to a point where they cannot post at all, because their signature is counted as spam.
Comments
Comment #1
jeremy commentedThe duplicate content filter looks to see if the same identical content is posted in multiple places. It does this with a simple hash.
The URL filter looks to see if there are too many of the same URLs in the same piece of content -- it does not look at other postings and thus does not get a cumulative score as you describe. That said, it is the most suspect for the problems you are seeing. Or, if you've ever marked one of these user's postings as spam, the URL filter will learn that the domain in their signature is probably spam and thus will start preventing it in the future.
Leaving open to debug further.
Comment #2
jeremy commentedOn my dev server I added a user with a signature that included a link. I then posted content with that user and confirmed that it did not cause any problems.
The only reason this will cause problems is if the URL in their signature has been determined to be spam. In that case, then of course their postings will also be considered to be spam.
Comment #3
gnassar commentedPlease reclose this if I'm mistaken -- but the key symptom of the OP, that scores of posters with sigs progressively goes up with time, doesn't seem to have been tested for yet. Granted, it seems unlikely that duplicate would be doing this, but it might merit some debug info from the OP to see exactly what filter is causing that.
Comment #4
jeremy commentedI was unable to duplicate this, and there is nothing in the code that would cause this.
Comment #5
gnassar commentedI thought that perhaps if a post with a sig was marked as spam, then multiple future posts with that same sig could get flagged (via Bayesian). But there's no reason they would keep increasing in value, unless the same posts with the sig kept getting flagged -- or other posts with words in the sig were also regularly flagged. But then, the sig actually *would* be spam, wouldn't it?
In any matter, couldn't replicate the precise scenario of spam scores increasing without anything getting flagged. Would need further info to reopen.