We installed 6.x-1.x-dev on a 6.10 site last night ... the documentation was a wee bit terse :) but I enabled all the modules and all filters and left the rest default, figuring that would give us a good start. What we have been getting as spam has mostly been lists of 10-40 URLs to sites of questionable content and I figured the filter for having a bunch of URLs would get those even without having to do anything fancy, but this morning there were 4 new ones. Am I missing something?

Also, we picked the option for putting spam into a special review queue. Where would we look at what is in that queue. Shouldn't that option leave the content unpublished until it is approved or deleted?

FWIW, 100% of the spam we are getting is anonymous comments so that is the only content I have selected. There is very limited pool of people authorized to create other content, so that isn't a problem.

Comments

tamhas’s picture

Having lowered the threshold to 70, we seem to be doing much better, but I am still wondering about the "special review queue" since the only record I see of automatically handled spam is in the log. To me, "special review queue" sounds like something I should be able to flip through and verify that everything in it is actually spam.

If one manually marks a post as spam, which unpublishes it, does it ever get automatically deleted or does it hang around forever?

If a comment is flagged as spam automatically and one has selected the "special review queue" option, does the post even go away automatically?

Antinoo’s picture

Same here: I had to lower the threshold to avoid comments with a large amount of spam URLs.
There really seems to be a lack of documentation, united with a not-so-friendly UI, which makes this powerful module not so... cute.

tamhas’s picture

Yeah, although I have to say that with the threshold lowered a notch and now having the Spam tab on the Admin Comments page facilitating cleanup, if I could get the Not Spam link working I would be pretty happy. Might try that patch tonight.

naught101’s picture

try setting the spam log level to "debug", and watch a couple of spam go through (and then unset it!!). it'll show you how the gain is working.

You problem is probably that the bayesian filter is marking spam as "not spam", because it hasn't learned what spam looks like yet. I think this is why the gain for the bayesian filter is set low by default.

It'll improve considerably after 50-100 spam, and then you can probably raise the threshold again.

tamhas’s picture

With the current threshold, I think we have had less than 1% marked as spam that weren't. Now that I have the mark as not spam link working, those clean up fairly easily, so I'm not sure that we need to adjust until we see some pattern which suggests a change.

gnassar’s picture

Status: Active » Closed (fixed)

Remedied by other patches.