Whenever the duplicate content filter is enabled, there is a conflict with users who add a signature to the end of all their posts.

The more the users post, the higher their score is on all of their posts. This gets to a point where they cannot post at all, because their signature is counted as spam.

Comments

jeremy’s picture

The duplicate content filter looks to see if the same identical content is posted in multiple places. It does this with a simple hash.

The URL filter looks to see if there are too many of the same URLs in the same piece of content -- it does not look at other postings and thus does not get a cumulative score as you describe. That said, it is the most suspect for the problems you are seeing. Or, if you've ever marked one of these user's postings as spam, the URL filter will learn that the domain in their signature is probably spam and thus will start preventing it in the future.

Leaving open to debug further.

jeremy’s picture

Version: 5.x-3.0-beta1 » 6.x-1.x-dev
Status: Active » Fixed

On my dev server I added a user with a signature that included a link. I then posted content with that user and confirmed that it did not cause any problems.

The only reason this will cause problems is if the URL in their signature has been determined to be spam. In that case, then of course their postings will also be considered to be spam.

gnassar’s picture

Status: Fixed » Postponed (maintainer needs more info)

Please reclose this if I'm mistaken -- but the key symptom of the OP, that scores of posters with sigs progressively goes up with time, doesn't seem to have been tested for yet. Granted, it seems unlikely that duplicate would be doing this, but it might merit some debug info from the OP to see exactly what filter is causing that.

jeremy’s picture

Priority: Critical » Normal

I was unable to duplicate this, and there is nothing in the code that would cause this.

gnassar’s picture

Status: Postponed (maintainer needs more info) » Closed (cannot reproduce)

I thought that perhaps if a post with a sig was marked as spam, then multiple future posts with that same sig could get flagged (via Bayesian). But there's no reason they would keep increasing in value, unless the same posts with the sig kept getting flagged -- or other posts with words in the sig were also regularly flagged. But then, the sig actually *would* be spam, wouldn't it?

In any matter, couldn't replicate the precise scenario of spam scores increasing without anything getting flagged. Would need further info to reopen.