Over the past few months, the spam module has evolved from a simple idea to a fully functional collection of tools that can automatically deal with spam comments and other spam content posted to a Drupal powered website. The module currently provides four methods for detecting spam: a trainable Bayesian filter, support for manually entered custom filters, the ability to count number of links in content, and detection of content posted from open email relays.

Spam, what and why?

Generally speaking, spam is any unwanted content posted to a website that is unrelated to the subject at hand. A web search for "spam comments" will turn up a large number of discussions on the phenomenon, a growing annoyance on the world wide web. Spam comments usually take the form of advertising and links back to the spammer's own website, often posted with the use of automated tools. The spammer's goal is usually to increase their ranking in google and other search engines.

Bayesian filter

The spam module implements a php-native Bayesian filter which performs statistical analysis on spam content. It counts which words appear more often in spam content and which words appear more often in non-spam content, and then with this information determines the probability that new content is or is not spam.

Upon initial installation, the spam module naively assumes that all content is not spam. Each new comment and other content posted to the site will be marked as not spam, and it is up to the site administrator to teach it when it makes mistakes. Teaching is as simple as clicking "mark as spam" or "mark as not spam", whichever is appropriate. The module then breaks the posting up into words which are stored in the database for future reference. It operates on the rule that the more a given word shows up in spam content, the higher the probability that future content with that same word is also spam.

As most spam comments are trying to increase their search engine ranking, the most revealing piece of information contained is usually a link back to their website. Some spammers actually cut and paste earlier comments from the same webpage, with the only new content being a link back to their website. Because of this, the filter provides special handling for domain names. Any new comment posted that contains the domain name of a known spam site will itself be marked as spam. Spammer domains are automatically learned from previous spam comments. An administrative page is provided for managing automatically learned spammer domains, allowing you to add additional domains or to edit and delete existing ones.

In practice, I have been using the plain Bayesian filter on KernelTrap.org without even special URL handling for a couple of months. It took teaching the filter 36 spam comments before it was able to catch its first true spam posting in the wild. Since that time I've seen another 20 spam comments, and it's automatically caught nearly half of them. With further training, I expect this precentage to continue to increase, ideally to 95% accuracy or better. However, realistically a Bayesian filter alone is probably not sufficient, thus the addition of additional spam detection tools in the 4.5 version of the module.

Custom filtering

Custom filters provide site administrators with the ability to blacklist, whitelist, or greylist new posts based on the matching of words, phrases, and regular expressions. The module tracks how often each of your custom filters match against new content, allowing you to determine their effectiveness.

URL limiting

Spam content often contains an abnormally large number of links, all in an effort to increase their search engine rankings. The spam module can be configured to count the number of links in each new posting, and if more than your specified limit, the posting can be marked as spam. A threshold can be defined for total links, as well as for how many times the same link shows up.

Distributed server boycott list

Finally, the spam module can be enabled to look up the poster's IP address in the distributed server boycott list. If the IP is found, it is known to be an open relay or otherwise untrusted email server, and thus the comment will be marked as spam. The theory is that email spammers are probably also comment spammers.

Current development

As several new features were recently added to the module, current development is minimal as the goal now is to see how it performs and to fix any new bugs that might turn up. Of course, effort will also be focused on attempting to optimize the logic, and to generally cleanup the code. Finally, there is a need to add watchdog logging to the module.

Spam mailing list

A mailing list has been created primarily to discuss the development of the Drupal spam module. However, anyone interested in discussing the spam problem in general and how it affects Drupal, or even in developing an alternative module for dealing with spam, is fully welcome and encouraged to join the mailing list. Full details can be found here.

Future development

When I originally started working on this module, my main goal was to learn how a Bayesian filter works. In doing my research, I learned of Markovian tokenizing in which phrases are examined instead of just words. While implementing this functionality could result in a more effective Bayesian filter, the overhead it would introduce doesn't seem worth it. As I begin to see spam comments that are cut & paste identical to non-spam comments, I'm more and more convinced that improving the tokenizer to better locate URLs and domain names is much wiser investment of effort.

I've also considered adding more actions to the module. Currently it can "auto-unpublish", and it can "notify the site administrator", and that's it. Other actions could include blacklisting the IP address (or user, if posted from a user account), preventing the spam from being posted in the first place, or interfacing with comment moderation to push suspicious comments into the moderation queue. None of these ideas are currently being actively pursued.

Finally, it would be wise to review the solutions available for other CMS's, such as Movable Type's MT-Blacklist and WordPress's numerous solutions. The problem is obviously not unique to Drupal, and we can learn a lot from other people's efforts.

Wishlist

The top of my wishlist is to get UI experts involved to help improve overall usability of the module. For example, the existing "mark as spam" and "mark as not spam" text links add significant clutter to the link section of posts. One thought I've had is to replace them with small icons. Additionally, as more functionality is introduced, more configuration options get introduced, and this can lead to general confusion. Perhaps effort could be made pick the logical defaults and reduce the number of configurable options. It would be interesting to compare this module to other open source solutions for other CMS's and to compare usability.

My second wishlist item would be to merge some spam filtering functionality into the Drupal core. I'm thoroughly convinced that it's just a matter of time until all website owners have to regularly deal with spam, just like all email owners currently have to deal with spam.

Summary

The collection of tools that comprise the spam module should prove quite effective in beginning to battle the rising tide of spammer comments, but it's certain to be an ongoing effort. If you're interested in getting involved, consider subscribing to the mailing list and joining the discussion.

Comments

Jeremy’s picture

I should have added: the spam module is available here. ;)

Jeremy’s picture

When I wrote this article earlier today, I'd seen less than 60 spam comments on my site in the past 3 months. And now, just 12 hours later, I've had well over 400. Looks like the flood gates have opened. Fortunately, URL filtering was the key for this first spam flood - lots of random text, but all ending with a link to the same domain name.

grohk’s picture

Let me guess...Did those URLs happen to be hawking a free online casino? It is funny that you wrote this when you did, since I have been getting hit fairly hard since yesterday. I had never gotten a spam comment until then. But thanks to your module, I was ready. Thanks Jeremy.

Code Orange: Drink Your Juice

bertboerland’s picture

since 1 day I get hit real hard. See also this posting I made on my site.
--

groets


bertb

--
groets
bert boerland

Jeremy’s picture

Did those URLs happen to be hawking a free online casino?

Indeed, that was the precise culprit. I wonder how many people got hit by that same script...

bertboerland’s picture

this posting which has some details.
--

groets


bertb

--
groets
bert boerland

Jeremy’s picture

Interesting...

mike3k’s picture

The spammer seems to be accessing a random low node number and posting a comment to it.

They seem to be doing GET /node/9? followed by POST /comment/reply/9?

Each set of requests comes from a different IP address.

Disabling comments for all nodes more than a few days old eliminated that. I also took the very extreme measure of blocking their particular user agent in my .htaccess after verifying that those are the only hits using this exact agent string:

SetEnvIfNoCase User-Agent "\(compatible; MSIE 5.5; Windows 98; Win 9x 4.90\)" denyThis
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=denyThis
</Limit>

--
Mike Cohen, http://www.mcdevzone.com/

charybdis’s picture

Probably the same tossers. Anyone know of a way of auto-closing old threads? I've just been browsing through the forums for a way, but not come up with anything yet.

mike3k’s picture

There really needs to be a feature that does it. I just did it by going into my database and doing 'update node set comment=1 where nid<1500'. Adjust the node number as appropriate.

If such a feature is added, it should close comments after n days of no activity rather than all threads more than n days old. There are MT and WordPress plugins which do that.

--
Mike Cohen, http://www.mcdevzone.com/

charybdis’s picture

But I'm very, very, very forgetful ;-) I know I'd do it once and then never remember again.

grohk’s picture

I have been anticipating this day would come when our site decided to allow unverified Anonymous comments. Thanks to Jeremy's module we are keeping it to a dull roar, but it is annoying that the bot is picking old nodes and posting there...It is mucking up our tracker even though we are unpublishing the spam comments.

Until today, I also forgot to patch the core comments module when I upgraded to 4.5.1, so I had to redo that so I can mark comments faster.

Code Orange: Drink Your Juice

charybdis’s picture

And it seems to snap the post back into the archives automatically when the spam comment is unpublished.

grohk’s picture

Mine is still showing new comments, even though the comments are gone. I got nodes from a year ago filling my tracker. What version of spam.module are you running?

Code Orange: Drink Your Juice

charybdis’s picture

But for 4.5. Updated to the newest CVS earlier this evening for the URL limiter, but haven't had any fresh meat for the grinder to test it on. The only part they can even get at is the blog entries and a few vestigial Stories, but my list looks like it's in the right order (http://www.richardcobbett.co.uk)

Jeremy’s picture

Mine is still showing new comments, even though the comments are gone. I got nodes from a year ago filling my tracker.

Yes, this is how it works (and has always worked). The spam module simply marks a comment as unpublished - the node it was attached to remains at the head of the tracker once it has received a spam comment. (I'm still using Drupal 4.4 -- perhaps the 4.5 tracker works differently...?)

Dries’s picture

Strange. That must be a bug in the tracker module's SQL queries or in the way the node_comment_statistics table is updated.

charybdis’s picture

Engineer's luck. I'm sure it's been working since I upgraded, but I just tested it with a self-made spam post, and it stayed glued to the top.

grohk’s picture

And if so, with Tracker or with Comment?

Jeremy’s picture

Or perhaps with the spam module... Am I not updating a field I should be when I unpublish the comment?

hdhd’s picture

The drupal spam module is showing the errors bellow. Does someone know where to find the table piece?

user error: Table 'buzzine_buzzine.spam_tracker' doesn't exist
query: SELECT COUNT(*) FROM trackback_received tr LEFT JOIN spam_tracker s ON tr.trid = s.id WHERE tr.status = 0 in /home/buzzine/public_html/46/includes/database.mysql.inc on line 66.

user error: Table 'buzzine_buzzine.spam_tracker' doesn't exist
query: SELECT tr.*, s.probability FROM trackback_received tr LEFT JOIN spam_tracker s ON tr.trid = s.id WHERE tr.status = 0 ORDER BY created DESC LIMIT 0, 50 in /home/buzzine/public_html/46/includes/database.mysql.inc on line 66.

[]s
hdhd

Prometheus6’s picture

I'd post a module for download but I can't seem to getthe attention of those who give access to CVS

/**
  *comment_closer.module
  *Automatically close comments on nodes beyond a configurable age
*/

function comment_closer_help($section) {
  switch($section) {
    case 'admin/block/help':
      return t('<p>Automatically close comments</p>');
      break;
    case "admin/modules/recent_comments":
      $output = "";
      break;
    case "admin/modules#description":
      $output = "Schedule automatic closing of comments for selected node types based on the age of the node";
      break;
    default:
      $output = "";
      break;
  }
  return $output;
}

function _comment_closer_nodeoptions($nodetypes){

  foreach($nodetypes as $nodetypename){
    $optionarray[$nodetypename] = $nodetypename;
  }
  return $optionarray;
}

function comment_closer_settings() {
  // list of node types to affect
  // age of nodes to close comments on
  $nodetypes = node_list();
  $age_limit_list = array(t('year') => t('year'),
                          t('month') => t('month'),
                          t('week') => t('week'));
  $cycle_length_list = array(t('yearly') => t('yearly'),
                          t('monthly') => t('monthly'),
                          t('weekly') => t('weekly'),
                          t('daily') => t('daily'));
  $output = form_select(t('Node types'), 'comment_closer_types',
                          variable_get('comment_closer_types', node_list()),
                          _comment_closer_nodeoptions($nodetypes),
                          t('Types of nodes for which comments will be closed'),
                          'size="'.($nodetype_count < 5 ? $nodetype_count: 5).'"', 1).
                          form_select(t('Older than'), 'comment_closer_age',
                            variable_get('comment_closer_age', 'month'),
                            $age_limit_list, t("Age of nodes for which comments will be closed")
                        ).form_select(t('Execute'), 'comment_closer_cycle_period',
                            variable_get('comment_closer_cycle_period', array('daily')),
                            $cycle_length_list, t('Time between comment closings')
                        );
  return $output;
}

function _comment_closer_node_select($nodetypes){
  if ($nodetypes == 0) {
    return '';
  } else {
    foreach ($nodetypes as $nodetype_index){
      $node_condition[] = "(type='$nodetype_index')";
    }
    return " AND (".implode(" OR ", $node_condition).')';
  }
}

function comment_closer_cron() {

  $now = time();
  $current_date = getdate($now);
  $next_cycle_time = variable_get('comment_closer_next_date', $now);

  if ($now >= $next_cycle_time) {
    //set it up
    $limit = variable_get('comment_closer_age', 'month');
    switch ($limit) {
      case 'month': {
        $current_date['mon'] = $current_date['mon'] - 1;
        break;
      }
      case 'year': {
        $current_date['year'] = $current_date['year'] - 1;
        break;
      }
      case 'week': {
        $current_date['mday'] = $current_date['mday'] - 7;
        break;
      }
    }
    $process_node_type_list = variable_get('comment_closer_types', 0);
    $oldest_allowed = mktime($current_date['hours'], $current_date['minutes'], $current_date['seconds'], $current_date['mon'], $current_date['mday'], $current_date['year']);

    // knock it out
    $qstr = "UPDATE {node} SET comment = 1 WHERE (created < $oldest_allowed) ".
                        _comment_closer_node_select($process_node_type_list);

    $result = db_query($qstr);

    // clean it up
    $current_date = getdate();

    switch (variable_get('comment_closer_cycle_period', 'weekly')) {
      case 'monthly': {
        $current_date['mon'] = $current_date['mon'] + 1;
        break;
      }
      case 'yearly': {
        $current_date['year'] = $current_date['year'] + 1;
        break;
      }
      case 'weekly': {
        $current_date['mday'] = $current_date['mday'] + 7;
        break;
      }
      case 'daily': {
        $current_date['mday'] = $current_date['mday'] + 1;
        break;
      }
    }
  $comment_closer_next_date = mktime($current_date['hours'], $current_date['minutes'], $current_date['seconds'], $current_date['mon'], $current_date['mday'], $current_date['year']);

    variable_set('comment_closer_next_date',$comment_closer_next_date);
  }
}

media girl’s picture

It seems someone discovered the Drupal sites list or something. I've been hit and took the short-term approach of closing comment posting from anonymous users. Things I would like to see:

- IP blocking. This is a huge one, and a basic feature in many BB packages. Since statistics is already tracking this data, I would hope that the feature could be easily implemented. It could also work to block trolls from re-registering. I know it's not ideal, especially in this world of DNS, but it can be a nice tool to have.

- Streaming into moderation workflow. Placing questionable -- but not certain spam -- into the moderation queue for routine approval/rejection. This is a most appealing idea for workflow.

I appreciate all the work done on making this module into what it is. I hope a most-capable developer (team) can take this on, as I feel this will only become a bigger and bigger issue, especially as spammers become more savvy.

--
mediagirl.org

charybdis’s picture

A whole lot of cycling going on. Although it would be good anyway, as a troll-busting addition.

grohk’s picture

And a "good" troll knows how to use proxies. You could also end up blocking someone who is not a troll, but just happens to be on the same subnet as a real troll.

I have seen some people filing feature requests on the captcha module (which I use for new user registrations...Quite nice) for captchas on the anonymous comments form.

Issue can be seen here -- http://drupal.org/node/11265

Code Orange: Drink Your Juice

charybdis’s picture

As Jeremy said, it's a better one for the captcha module directly. I don't bother with it for new user registration because I've never had a problem there (I try not to lock things down until I need to - it really annoys me to make visitors jump through hoops just because a couple of assholes are spamflooding things) but I'd definitely like it on anonymous comments. Much as I enjoy paying for the bandwidth to let them feed the spam filter, I'd rather they got turned away at the gate...

Jeremy’s picture

I've thought more about it since you posted your feature request, and I really like this idea (for enhancing the captcha module). I think you'd find a need to enable captcha for both anonymous comments and user registration - I do get spam from registered users.

Once done, it seems there'd be little need for the spam module. And that's a good thing.

(Thus, if no-one's provided a patch for the captcha module by the time I've upgraded KernelTrap to 4.5, then I'll work on it myself. But my upgrade is slow going, so don't hold your breath. ;)

arnabdotorg’s picture

I have investigated post-time captcha tests, but haven't found a clean way to do it. The comment API does not allow addition of form entries, unlike the user form. This is the major stumbling block, and I don't want to use theme_ (presentation layer) functions to include all functional attributes. Hopefully we will have a cleaner design for comments soon.

In case there's something in the API I've overlooked, feel free to point it out, I'd love to add this feature.

Dries’s picture

With Drupal 4.5's new filter system, you could disable the use of <a href="">-tags for certain user roles (most notably anonymous users). That would render a lot of comment spam ineffective.

In Drupal HEAD (development version), we have a new flood detection mechanism. This could be used to put a limit on the number of comments one can post per hour. Useful to block mass-comment spam?

charybdis’s picture

Automatically paying assassins via PayPal to track the culprit down and dunk them in sewage until they apologise. Say, a dollar per spam. Not much, but the cash could be aggregated from every user hit by the idiot until we have enough ;-)

cel4145’s picture

add their domains to the Lycos screensaver ;)

charybdis’s picture

But needless to say, it lives on in virus form.

That screensaver was literally one of the stupidest, most inane ideas I've ever seen on the web.

___

I have a website. It's very blue indeed.

Jeremy’s picture

With Drupal 4.5's new filter system, you could disable the use of <a href="">-tags for certain user roles (most notably anonymous users). That would render a lot of comment spam ineffective.

Ineffective as far as the goal of increasing google ranking. But still incredibly annoying. 300+ spam comments to an online casino even if the links don't work is quite frustrating! ;)

Perhaps combine a href filtering with captcha -- if no links, just post the comment. If links, then first validate the anonymous poster with a captcha.

I'm finally understanding the need for mass-cleanup tools.

bertboerland’s picture

One of the sites is already dead, the other is still in DNS / Whois but not working. Note that the masterplan might be to re-activate the domains somewhere lese later. The value of the domains is in each cased increased (at last, from some point if few, I would want to have them for free)
--

groets


bertb

--
groets
bert boerland

mike3k’s picture

Someone should start a collection to buy all of those deactivated spam domains just to sit on them and prevent anyone else from buying and re-activating them. Make sure they remain dead forever.

--
Mike Cohen, http://www.mcdevzone.com/

Steven Mansour’s picture

I copied the spam folder into the module folder, imported the .mysql file into my mysql drupal DB, and was able to adjust the settings for the spam module in admin -> settings -> spam. However, when I go to a comment and try to mark it as spam, I get a page not found error trying to get ... /spam/comment/288/spam .

I looked through the documentation and searched the forums, but couldn't find anything. I have a default 4.5.1 install with clean urls enabled.

Thanks in advance,

Steven

grohk’s picture

it was a bug I think. Try upgrading to the latest 4.5 branch version.

http://cvs.drupal.org/viewcvs/drupal/contributions/modules/spam/

Currently, it is 1.9.2.16

Code Orange: Drink Your Juice

Artti’s picture

Would it be possible to check that the sender is not a machine by showing an image and asking what it says?

Generated image: GHjgj65
Please enter the digits above: __________

charybdis’s picture

Steven Mansour’s picture

Got it working by using the CVS branch version. No problems.

I did have to add "if not exist" statements to the branch spam.mysql though, since I had already imported the release spam.mysql and there were conflicting tables.

However, I notice that while it unpublishes spam comments, there's no easy way to permanently delete them on the admin -> comments -> spam overview page. You have to click "delete" on each line in order to erase them one by one. Otherwise, they just keep piling up in the database as unpublished comments.

How easy would it be to add a "select all" checkbox and a "delete" button on that page?

Just a thought. :)

Jeremy’s picture

See this feature request. Essentially, until the underlying tokenizing logic is considered stable, I will not provide a mechanism for deleting comments. The problem is, if you delete all spam comments, you'll have to retrain the Bayesian filter each time you upgrade.

That said, if you don't care about that then grab the patch I provided for the comments module. It allows you to mass-delete spam comments.

groovebunny’s picture

Glad to see this all being talked about. My site has been flooded with online casino spam for the past few days. I've basically resorted to turning anon comments to be moderated before posting. I'll definitely give the Bayesian filter a try.

Thanks!

jbrauer’s picture

The spam module rocks. I'd love to have the delete capacity and don't care too much about the Bayesian filter. Most of the spam I've seen so far is easily managed by the URL and custom filters with a couple of regex entries (/holdem/i) etc.

It would be great if there was a way to easily re-run the spam filter to catch posted spam that it now knows are spam. I get into a situation where the score has gone up, it now says it's spam but the comment is still published and there is no easy to unpublish a group of comments. I ended up running a UPDATE comments set status=2 where cid > number and easily changed them all to unpublished, but it would be nice to do it from the interface.

--

grohk’s picture

As stated above, there is a patch in the Optional directory that is included with the spam module that handles mass marking of comments as spam. And if you have the option for unpublishing comments that are spam set it will do what you want.

Code Orange: Drink Your Juice

killes@www.drop.org’s picture

Untill now I always thought

If people don't want comment spam, they should disable anonymous comments.

However, as of today I am less optimistic that this is a sufficient approach. There are apparently people around that are so bored that they manually create email accounts with free webmail providers, create an account on drupal.org, and start to spam us with forum topics. I've deleted seven of them today. The person doing that created two free email accounts at sahyog.com, created a account here at Drupal.org and started to spam away. How can I tell that this is a person not a script? The person needed more than one preview for some of the forum topics.

I am not amused. Blocking the sahyog.com domain is an obvious solution, but there are more free webmail providers and some legitimate users might use them.

The accounts in question are http://drupal.org/user/15550 and http://drupal.org/user/15537. There accounts will probably be disabled later.

Do we need the spam module on drupal.org?
--
If you have troubles with a particular contrib project, please consider to file a support request. Thanks.

tangent’s picture

I've rewritten the captcha module to support comment and content posts. I borrowed heavily from the Drupal4Blog code to do it but I changed the image creation to be more complex. It could be better but it is a work in progress.

The module required changes to the comment module which I would have preferred to avoid but could not. You can get the module code here.
http://drupal.org/node/11265

You can get the comment module patches here.
http://drupal.org/node/14710
http://drupal.org/node/14708

As I mentioned in the captcha issue, there is at least one bug in the module so if you fix it before I do, please send me a patch.

Dries’s picture

If you create a bare-bones form_captcha() function, I might be willing to include that in core. The comment module could then be modified to have built-in support for captcha.

tangent’s picture

I am uncertain if you mean a form function to be included in common.inc or something else. If so, here are 2 variations of it. The first creates 2 form elements (1 for the question/image and 1 for the response) while the second uses the label for the question/image prompt.

/**
 * Format a captcha field.
 *
 * @param $question
 *   The question prompt. Could be a text question or an <img> tag.
 * @param $questionLabel
 *   The label for the question prompt.
 * @param $answerLabel
 *   The label for the question prompt.
 * @param $name
 *   The internal name used to refer to the field.
 * @param $size
 *   A measure of the visible size of the field (passed directly to HTML).
 * @param $description
 *   Explanatory text to display after the form item.
 * @return
 *   A themed HTML string representing the field.
 *
 * It is assumed that a captcha field will be required.
 */
function form_captcha($question, $questionLabel = 'Question', $answerLabel = 'Answer', $name = 'captcha', $size = '15', $description = 'This helps prevent automated submissions.') {
  return form_item($questionLabel, $question) . form_textfield($answerLabel, $name, NULL, $size, $size, $description, NULL, TRUE);
}

/**
 * Format a captcha field.
 *
 * @param $question
 *   The question prompt. Could be a text question or an <img> tag.
 * @param $name
 *   The internal name used to refer to the field.
 * @param $size
 *   A measure of the visible size of the field (passed directly to HTML).
 * @param $description
 *   Explanatory text to display after the form item.
 * @return
 *   A themed HTML string representing the field.
 *
 * It is assumed that a captcha field will be required.
 */
function form_captcha($question, $name = 'captcha', $size = '15', $description = 'This helps prevent automated submissions.') {
  return form_textfield($question, $name, NULL, $size, $size, $description, NULL, TRUE);
}

I assume this is just a foundation for the functionality as this does not offer much. Or perhaps I have misunderstood your request.

Dries’s picture

I haven't given this an aweful lot of thought but I want form_captcha() to generate and display an image along with a textfield for the answer. The design goal is to make adding captcha support to a form as simple as possible. Maybe something like this:

/**
 * Format a captcha field.
 *
 * @param $key The string to visualize in the image.
 *
 * @return blah
 */
function form_captcha($key) {
  // 1. generate image that shows $key
  // 2. show the generate image
  // 3. show a textfield for the answer
}

/**
 * Return a captcha key. 
 *
 * @param $length blah
 *
 * @return A randomized string of the specified length.  NULL if there is no captcha support possible.
 */
function captcha_get_key($length = 5) {
  if ($gd_support_is_available) {
     // generate and return a randomized captcha key
  }
}

The internal form name can be hardcoded to 'captcha', and the textfield's size could be set to strlen($string) or a variation thereof.

Both the comment and the contact module could take advantage of such functions, and probably the user and node module as well.

Example usage:

if ($key = captcha_get_key()) {
  $form .= form_captcha($key);
}
bertboerland’s picture

but wouldnt an image displayed make the blind people dumb as well (being unable to comment on the internet)? i'm no expert here, but isnt there a smarter way to see what is a bots and who is human?

--

groets


bertb

--
groets
bert boerland

tangent’s picture

I agree. My version of the captcha module uses a text challenge/response instead of an image. The functions above would support either however.

tangent’s picture

Must the captcha be an image? I've implemented a text captcha in my latest version of the module and I have to say that it is cleaner, more efficient, less resource intensive, and more user friendly (especially for visually impaired users). It currently asks a simple math question (what is two plus two) but it could easily be made more complex.

I'll work on the function you've requested. Would it still be in the common.inc context?

Tarlbot’s picture

Of course this is written primarily for Movable Type users but this may have some helpful ideas for us:
http://sixapart.com/pronet/comment_spam.html

Written by John Gruber
http://daringfireball.net/

Matt Simpson’s picture

So... a month and a half after the last comment... what is the consensus? There seemed to be a burst of this in mid December, tapering into January.

I had to shut down my wordpress site... had to password protect my wikis (pmwiki engine)... and am considering a move to Drupal. Is there a solution for spam in comments?

I read through the whole thread above. But it seemed to taper off. Did something up there work?

media girl’s picture

tangent’s picture

Here's a blogger who has created another Captcha recognition program to blog spam.

http://www.mperfect.net/aiCaptcha/

kc’s picture

Nowadays capcha use is all over the place. Even Google added it for their accouns like gmail. It is good but if it is made userfriendly i beleive. On some sites you cannot even read the text they put as a human being. That causes frustration.

Man, web is becoming a hard to stay clean environment:-)


KC
----

FlemmingLeer’s picture

i-marco has several approaches to combat splog.

Via .htaccess file
http://www.i-marco.nl/weblog/archive/2005/08/29/saving_some_valuable_ban...

and via a javascript hiding the trackback url
http://www.i-marco.nl/weblog/archive/2005/08/24/trackback_spam_eliminated

The last one is also discussed here:
http://drupal.org/node/45905

jetsetter’s picture

I use this version: http://www.kerneltrap.org/jeremy/drupal/spam/

It works so well. Many thanks to anyone who has contributed to creating it.

jaynedarcy’s picture

I installed the latest version on my writing site, but as soon as I go into admin> spam (after activating the module) I get an error message that says no table exists. It was then suggested that I take the text from spam.install, go to myphpadmin, and run the script in a query window. When I did that, it said I had an error.

This is getting way above me. What's going on? How can I fix this?

This is the error message.

Table 'thirdi_drpl1.spam_log' doesn't exist query: SELECT * FROM spam_log ORDER BY sid DESC LIMIT 0, 50 in /home/thirdi/public_html/includes/database.mysql.inc on line 120.

art@progressiveu.org’s picture

This is a pretty old thread so I'm not sure if people are still monitoring it, but I'll give it a shot. Here's my problem:

I finally got the latest spam modules configured and working well on our site a couple of weeks ago (we're still on 4.6, but using the latest 4.6 version of the spam modules from kerneltrap). I'm satisfied that it stops most comment spam, and I understand how to train it and add more rules as necessary.

However, we have a lot of users who are subscribed to threads and get emails whenever a comment is published. After a day or so of using the spam module I realized that users are getting these emails even for comments that are auto-unpublished. Not good :-(. I had to turn off commenting for unregistered users again, because I didn't want to piss people off with all the emails when there were really no updates.

I realize that this is really problem with the comment module, because this also happens when the moderation queue is on (users get an email before the comment is approved). But I figure that spam experts will be best able to explain a workaround.

Any ideas?

=======
Art Morgan
http://www.progressiveU.org