Spam
The spam module provides numerous tools to auto-detect and deal with spam content that is posted to your site. Spam can be automatically unpublished and/or deleted.
The spam module provides four main mechanisms for automatically detecting spam: a trainable Bayesian filter, manually entered custom filters, counting the number of URLs, and detection of content posted from open email relays.
Shameless plug:
The spam module needs your financial support. I have many plans for greatly improving the module, but currently lack the resources to invest into the project. Contact me if you'd like to talk about helping out financially.
Specific plans include: streamlining the core functionality, and moving all filters into plug-in modules (allowing the disabling of un-needed filters, as well as the re-ordering of filter priority); adding support for easily sharing spam tables between multiple websites, even if the websites are using different database servers; filtering all form content utilizing Drupal's standard Drupal filter (no longer limiting the module to just detecting spam comments and nodes); improved internationalization support; cleaning up the workflow for working with detected spam; improving integration into existing Drupal workflow; improve underlying api to optimize performance, simplify development of third party spam filters, and to simplify integration of other modules with the spam module.
Features:
- Written in PHP specifically for Drupal.
- Highly configurable.
- Automatically detects and unpublishes spam comments and other spam content.
- Automatically learns to detect spam in any language using Bayesian logic.
- Automatically learns and blocks spammer URLs.
- Automatically blacklists IPs of learned spammers, preventing them from posting additional spam and wasting database resources.
- Detects repeated postings of the same identical content.
- Detects content containing too many links, or the same link over and over.
- Supports the creation of custom filters using powerful regular expressions.
- Can notify the user that his or her content was determined to be spam, preventing confusion over why their content doesn't show up.
- Can notify the site administrator in an email when spam is detected.
- Provides 'report as spam' links allowing users to easily help detect spam.
- Provides simple administrative interfaces for reviewing spam content.
- Provides comprehensive logging to offer an understanding as to how and why content is determined to be or not to be spam.
Overview:
The Bayesian filter does statistical analysis on spam content, learning from spam and non-spam that it sees to determine the liklihood that new content is or is not spam. The filter starts out knowing nothing, and has to be trained every time it makes a mistake. This is done by marking spam content on your site as spam when you see it. Each word of the spam content will be remembered and assigned a probability. The more often a word shows up in spam content, the higher the probability that future content with the same word is also spam. As most comment spam contains links back to the spammer's websites (ie to sell Prozac), the Bayesian filter provides a special option to quickly learn and block content that contains links to known spammer websites.
The custom filtering functionality can blacklist, whitelist or greylist based on the matching of words, phrases and regular expressions. For example, a custom filter can be defined to always mark content as spam if it contains the word 'Viagra'. Or, a custom filter can be defined to increase the probability that content is spam if it matches the case insensitive regular expression /free/i.
The spam module can also limit the total number of URLs allowed in comments and other content, as well as the number of times the same URL can be repeated in the same content. These limits can be different for comments and for other types of content. For example, if the module is set to only allow the same exact URL to appear in a comment twice, if "http://kerneltrap.org/" shows up in the same comment three or more times, the comment will be considered spam.
The fourth tool for detecting spam is to look up the poster's IP address in the Distributed Server Boycott List (http://dsbl.org/). If the address is listed, it is known to come from an untrusted email server such as an open relay and is marked as spam. The theory is that most comment-spammers are also email spammers.
As a Drupal administrator, you can decide to enable any or all of the above tools as best suited to your needs.
Releases
| Official releases | Date | Size | Links | Status | |
|---|---|---|---|---|---|
| 5.x-3.0-alpha4 | 2008-Apr-30 | 35.13 KB | Download · Release notes | Supported for 5.x | |
| 5.x-1.1-2 | 2007-Oct-17 | 39.75 KB | Download · Release notes | Recommended for 5.x | |
| 4.7.x-1.0 | 2007-Jan-03 | 38.58 KB | Download · Release notes | Recommended for 4.7.x | |
| Development snapshots | Date | Size | Links | Status | |
|---|---|---|---|---|---|
| 5.x-3.x-dev | 2008-May-01 | 35.4 KB | Download · Release notes | Development snapshot | |
| 5.x-1.x-dev | 2007-Nov-29 | 52.74 KB | Download · Release notes | Development snapshot | |
