Reset Bayesian filter? [#1040660]

OK, I'll admit that my inexperience with this module has had a role in getting me to the current situation. I set up a forum, and after a while the moderators complained about spam. I installed the Spam filter (for the first time) and things seemed better. The site is using the Duplicate, Bayesian, and URL filters. Most of the time I just went with defaults, trusting that the developers would be using sensible defaults.

I believe the default behaviour for the Spam module is also to prevent the content from being posted. I did inform the site admins and moderators that they needed to check the spam list, but from what I can see this was rarely if ever done.

The result, I believe, is a set up were people may have gotten overly anxious about a slow site and hit the submit button multiple times, triggering the Duplicate filter. This was left as spam, so the Bayesian filter indexed the content, and this cycle happened often enough that the site now thinks that all kinds of perfectly valid posts are spam.

I believe I need a way to reset the Bayesian filter. Can I do this without a complete uninstall and reinstall of the module?

Comments

Comment #1

jeremy commented 26 January 2011 at 16:29

The proper way to fix this is to teach the filters what is NOT spam. Currently it sounds like all it knows is a lot of terms that ARE spam. Every time it incorrectly marks content as spam, you need to click 'mark as not spam' which will then cause it to start learning terms that are not spam terms. When you're first getting started with the spam filter, it's also intended that you use existing published non spam to teach the filters more about what types of words are normal on your website. This is done by going to admin/content/comment, selecting a bunch of valid comments, and selecting "Teach filters comments are not spam".

Note that this latter feature is apparently broken in the current release candidate (so you'll have to rely on marking individual comments as not-spam one at a time for now):
#932758: "Teach filters selected comments are not spam" does nothing

While you could simply dump everything in the bayesian filter table, this would be a step backward and would eventually get you back to where you are right now. The key is to exposing the bayesian filters to both spam an not spam.