I'm sure you've read about it already, but I just wanted to register it here. I've been getting a lot of trackback spam about texas holdem poker recently. Would it be possible to get trackback.module to play with spam.module so that some sort of filtering mechanism can be set up?

Alternately, a procedure that hunts down the spammer and beats him with sticks would be just as satisfying.

--|PW|--

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Jeremy’s picture

FileSize
1.66 KB

The attached patch implements the _spam hook for the trackback module, allowing trackbacks to be run through the spam filter.

A few comments:
- if the spam module is not installed, not enabled, or too old, the trackback module will work normally
- adds 'filter tracbacks' option to spam _settings page, enabled by default
- requires latest version of the spam module, as updated today

Anonymous’s picture

I'm a total PHP ignoramus, so could you tell me how to implement this patch without breaking my blog?

Jeremy’s picture

If you're completely unfamiliar with PHP and applying patches, I recommend that you wait for others to test it, and for it to hopefully be merged.

If you're willing to figure it out, do a search for "patch" on this site to find some discussion on how to apply patches.

freyquency’s picture

Title: Anti-spam capability » Good page on applying patches
Component: Receiving » Documentation
freyquency’s picture

Title: Good page on applying patches » Anti Spam Trackback Patch

Sorry, forgot it would change the title of the whole thread. I forget what the name was exactly but it was something like this.

Anonymous’s picture

Title: Anti Spam Trackback Patch » Anti-spam capability
Component: Documentation » Receiving

Restoring

media girl’s picture

Thank you for getting on top of this. Already many people are giving up on trackback altogether because of the spam. It seems Drupal might be leading the charge in getting proper filtering into trackbacks.

But I have a question:

When you say the latest version of the Spam module available today, do you mean the one on the downloads page, dated 2004? Or do you mean the CVS latest here http://drupal.org/project/cvs/11104 ? I'm sorry, but I'm not up on the workflow here. The changelog indicates updates, but the date of the download file on the downloads page is unchanged, so I'm not sure to which you refer ... or whether the CVS spam head will work on 4.5.

Any clues? Thanks!

media girl’s picture

For reference for others, the latest version of the spam module is NOT the one available from the downloads page (as of this moment), but is rather here: http://drupal.org/files/projects/spam-4.5.0.tar.gz .

Jeremy’s picture

If I remember correctly, tarballs are now built every 6 hours. So, within 6 hours of me uploading changes into CVS, the complete tarball should be up-to-date. To be fully confident you have the latest release you could compare what you downloaded with what you view here (comparing the $Id tag at the top of the files).

To view what changes have been made recently (and when), take a look here.

ankur’s picture

As of the latest few patches for the 4.5 trackback module currently in the CVS (http://cvs.drupal.org/viewcvs/drupal/contributions/modules/trackback/tra...), users can now delete trackbacks as they wish... however, having an automatic spam filter may be a little while in coming. A more quickly implementable solution is having moderated trackbacks... I'm ready to look into it given what the feedback is here...

-Ankur

Junyor’s picture

Ankur: Is there a problem with Jeremy's patch or am I misunderstanding the functionality it adds?

Jeremy’s picture

The purpose of my patch (see #1 above) is to utilize the spam.module's _spam hook and "spam_check" function to provide comprehensive spam filtering for the trackback module. When applied, all new trackback's will be run through the spam module's Bayesian filter, URL filters, custom filters, etc...

Testers appreciated.

DocSavage’s picture

Thanks Jeremy. I replaced my band-aid with your patch on trackbacks, then subbed in your updated spam module. It's running now, no errors. Will let you know if there's any problems.
-Bill

DocSavage’s picture

Jeremy, when adding some new regex filters, I noticed a small error in the new spam module I downloaded from Drupal Download area. Issue opened and fix given: http://drupal.org/node/17177

I'm new here, so I hope this is the proper protocol so we can keep track of bug fixes. Let me know if I'm supposed to patch things I spot directly. Regards.

Jeremy’s picture

Thanks Doc, I merged your fix (missing %d) into the spam module. Indeed, if you find any additional problems with the spam module open additional spam module issues. A patch is not necessary when it's as simple a change as the last.

Is the spam module now catching spam trackbacks for you?

DocSavage’s picture

Well, I'm still golden... no trackback spams so far and I do see a few spammers coming by in the logs. Looks like it works.

DocSavage’s picture

Got hit with some trackbacks today, which seemed like they should have been picked up by the spam filter. Much of the spam is getting caught. Here's one that made it onto the trackback for one of my book signing events:

Submitted by phentermine (trackback) (not verified) on February 12, 2005 - 02:03. 
You may find it interesting to visit some helpful info about online poker texas holdem phentermine

In my url filters, I have entries for:
phentermine
poker
...

In my custom filters, I have regex filters for:
/phentermine/i (always spam)
/poker/i (usually spam)
...

For the URL filters, does it have to be exact URL match or is this a matching thing? In any case, the custom filters should have definitely blasted it because of "phentermine"

I'm going to readd my patch, in addition to yours.

Jeremy’s picture

Can you dump your custom filter table, and attach it here? Also, can you attach the complete un-edited trackback(s) that got through the filter? Either there's a bug, or there's a logical explanation... ;)

Thanks.

BTW: URL filters match on the root. ie, if the url is http://www.sample.org/this/is/the/url, the URL filter sees "www.sample.org". This means that http://mail.sample.org/another/url would be different, as the URL filter would see "mail.sample.org". This was by design, but in retrospect I'm thinking it's more logical that both of these should simply be seen as "sample.org". But that's an unrelated issue. In your case, I only care about the custom filters, to understand why it didn't block your trackback spam.

Anonymous’s picture

Here's cut & paste on my custom filters:

Custom filtersfilter type effect delete matches last match operations
/casino/i regex usually spam disabled 0 Dec 31 1969 - 19:00 edit delete
/phentermine/i regex always spam disabled 17 Feb 12 2005 - 02:03 edit delete
/poker/i regex usually spam disabled 34 Feb 12 2005 - 02:24 edit delete
/psxtreme/i regex always spam disabled 0 Dec 31 1969 - 19:00 edit delete

Let me know if you need it in some other format. Unfortunately, I've already deleted all the trackback entries from my DB, so you'll have to work with the paste of the trackback above. You can see that the filters have nailed at least 34 spam comments, but I'm not sure if it nailed any trackbacks.

I think trackbacks must not be going through the spam filter. I do have spam trackback filtering ON. The "/phentermine/i" regex should have blasted the trackback content since anything with it should be marked spam. Have could do a simple test where you trackback to yourself using some of the spam words matched by a custom regex filter?

I have to revise my URL filters given what you said.

Anonymous’s picture

Here's cut & paste on my custom filters:

Custom filtersfilter type effect delete matches last match operations
/casino/i regex usually spam disabled 0 Dec 31 1969 - 19:00 edit delete
/phentermine/i regex always spam disabled 17 Feb 12 2005 - 02:03 edit delete
/poker/i regex usually spam disabled 34 Feb 12 2005 - 02:24 edit delete
/psxtreme/i regex always spam disabled 0 Dec 31 1969 - 19:00 edit delete

Let me know if you need it in some other format. Unfortunately, I've already deleted all the trackback entries from my DB, so you'll have to work with the paste of the trackback above. You can see that the filters have nailed at least 34 spam comments, but I'm not sure if it nailed any trackbacks.

I think trackbacks must not be going through the spam filter. I do have spam trackback filtering ON. The "/phentermine/i" regex should have blasted the trackback content since anything with it should be marked spam. Have could do a simple test where you trackback to yourself using some of the spam words matched by a custom regex filter?

I have to revise my URL filters given what you said.

Jeremy’s picture

"I think trackbacks must not be going through the spam filter. I do have spam trackback filtering ON. The "/phentermine/i" regex should have blasted the trackback content since anything with it should be marked spam. Have could do a simple test where you trackback to yourself using some of the spam words matched by a custom regex filter?"

Of course, this is how I tested the patch when I first wrote it. On my devel server it was catching trackback spam perfectly. I'll reinstall and reapply the patch and test again, to be sure I attached the proper patch. You are using the latest version of the trackback module, correct? (My spam patch should have applied without errors)

"I have to revise my URL filters given what you said."

The advantage to URL filters is that they are auto-added. If you're manually adding them, you should probably just be adding them in the form of custom filters instead, as there is more power available with custom filters.

DocSavage’s picture

I figured you tested trackbacks on your devel server; I guess I should have been more specific in asking if you've tried the exact word/regex combo: "/phentermine/i". Shouldn't matter, but the first step is just seeing if we can reproduce the error. I used the most recent trackback module under Drupal downloads, but this was before they updated it in CVS to handle the trackback deletions. By most recent trackback module, are you talking about the very latest, since I thought they modified it after you submitted your patch.

ankur’s picture

Jeremy: I'm sorry. I did not see the patch file attachment the last time I was here. I saw the word spam and spam module and thought "project" for another day. But hey, you already did the project. Thanks a bunch.

I'm ready to commit this patch, but I have to check in with the team at CivicSpace Labs. Trackback is currently being maintained by CivicSpace (http://www.civicspacelabs.org) and any changes to the code go through the lead developer, which I'm not. I just bumped the patch file to him. It should be OK save for the fact that CivicSpace (a distribution of Drupal) does not ship with the spam module.

In any case, stay tuned. I'll keep you posted. But once again, thanks for your efforts. It is much appreciated.

-Ankur

Jeremy’s picture

If no spam module is installed, trackback will function normally even with my patch applied.

media girl’s picture

I was unable to run this patch on the current trackback version (1.14.2.1) -- I got this error:

patch unexpectedly ends in middle of line
Hunk #2 succeeded at 269 with fuzz 2.

I'm in the midst of a trackback spam attack, so I will report if I'm meeting with success anyway.

media girl’s picture

It works. On the tracker, it shows the thread as updated (which may be a tracker issue), but the spam trackback was successfully filtered. Woohoo!

media girl’s picture

Am I out of line to re-open this? As noted above, I ran the patch, but there remains a problem.

Yes, the trackback spam is not showing up under comments, but under the trackbacks tab on the node, the spam remains. The only option at this point is to delete it.

I am not the only one experiencing this, as it was pointed out to me by someone else on their site. Any ideas?

Anonymous’s picture

Let me toss in my experiences at http://www.pennywit.com. I installed the new spam.module a couple weeks ago and patched trackback.module over the weekend. If you want to see Media Girl's narration in action, look here:

http://www.pennywit.com/drupal/node/1787/

Check under "comments," then look under the "trackbacks" tab.

--|PW|--

media girl’s picture

More info:

On all trackbacks, I have this error in the logs:

user error: Unknown column 'autodelete' in 'field list' query: SELECT scid, filter, regex, effect, autodelete FROM spam_custom in /[path]l/mediagirl.org/includes/database.mysql.inc on line 125.

The spam trackbacks are getting unpublished, according to the comments list, but they are still visible via the trackback tabs. For example, the online poker trackback visible on this page is "unpublished" according to the spam list under comments -- yet there it is to see for anyone who clicks on the trackback tab on the node. I'm getting these one an hour for the past two days now.

If anyone has any ideas, I welcome them. As I noted above, the patch did not fully take on my module, and perhaps there is a line of code I need to go in and hack. Help!

Jeremy’s picture

"On all trackbacks, I have this error in the logs:

"user error: Unknown column 'autodelete' in 'field list' query: SELECT scid, filter, regex, effect, autodelete FROM spam_custom in /[path]l/mediagirl.org/includes/database.mysql.inc on line 125."

Review the spam.module changelog. Note that a recent change requires that you execute the following line to update your database:
ALTER TABLE spam_custom ADD autodelete tinyint(1) unsigned default 0;

"The spam trackbacks are getting unpublished, according to the comments list, but they are still visible via the trackback tabs. For example, the online poker trackback visible on this page is "unpublished" according to the spam list under comments -- yet there it is to see for anyone who clicks on the trackback tab on the node. I'm getting these one an hour for the past two days now."

I don't have time to look at this at the moment, but it sounds like the trackback module is ignoring the comment status, displaying it regardless of if it's published or not. That would be a bug, fixed by modifying the db_query that loads the trackbacks.

"If anyone has any ideas, I welcome them. As I noted above, the patch did not fully take on my module, and perhaps there is a line of code I need to go in and hack. Help!"

It does say that it was successfully applied. The offset is probably due to recent changes in the trackback module, but it will explicitly tell you if it was unable to apply the patch. Look for *.rej files -- if there are any, you need to manually apply them. (You can open *.rej "reject" files in a text editor and see line by line what it was trying to change and do so manually)

Jeremy’s picture

FileSize
2.71 KB

Here's an updated version of the patch. It's identical to my earlier patch except that it also updates the db_query that loads the trackbacks to only load them if they're published.... (this is untested)

media girl’s picture

Okay, I ran the db table update on the spam module table as indicated (which is not very well documented, I must say, very easy to miss). Next, applied the new patch. I got the same error in applying the patch, but a spot check in the revised module tells me that the changes were made. At least I think so.

A spot check on the site shows that now the trackback spam will not show under the trackback tab.

Success! Thank you!

drumm’s picture

Please commit this to the HEAD branch. (I have informal commit permission for this module from Kjarten.) I'm fine with either jeremy or ankur doing this.

garym@teledyn.com’s picture

I do applaud the work being done here, and I cheer the rapid response with which the Drupal community has addressed this issue, but I'm still not turning TB back on -- the filter approach does catch spam, but it is too naive to be effective as the only solution.

Let me put this into perspective with this link to full instructions for creating your own TB-spamming module: the spam process is completely automated by 20 lines of PHP code -- according to The Register, an average link-spammer can reap more revenue in a month than most of us make in a year, and they react to our setting up filtre hurdles by simply upping the attack frequency.

challenging the spammers with filtres must inevitably lead to spammers hitting back with a DoS attack

So what can we do? Filtres are ok for casual spam and loose-cannon flame-bait posters on your site, but those are only minor annoyances compared to the professional link-spammer. I think the general rule is that any given blog only sees trackbacks from a small and nearly-finite list of blogs in that social bubble, so what I'd like to see is an option for gated TB access to my site, a whitelist policy where only approved/federated sources may submit directly to my content pages.

In the simplest form, we match a list of regex to the TB url field and apply Apache-like ACCEPT/DENY rules, spooling denied TB posts for manual review where I can easily whitelist any new entries: If I've seen your URL before and have approved you, your TB is accepted, otherwise it's held for review -- even those accepted from known URLs could then be passed through the filtering stage as a simple protection against spoofing, because it won't take the spammers long to realize they can give any URL they like and still put their trash links into the excerpt field.

That last issue, spoofing the TB headers, leads to the reason why I call the above spec the 'simple' model: As it sits, TB does not record the true source of the message; the TB model makes a tacit and false assumption that everyone will represent themselves honestly. A more thorough whitelisting system would need to record the REMOTE_HOST originating IP address and hope that we can build reliable blocks for spammers on dialup and those leveraging unaware compromised windows machines.

It's worth noting here, since this is a general thread on Spam strategies, that I have already seen a spammer site with the foresight to run their own Drupal server thereby gaining unrestricted authenticated member priviledges on all other Drupal sites that enable the distributed logins. This means our login system also needs a whitelist system to grant or deny remote authenticating drupal servers based on hostname patterns. Also in this thread of spam protections, I have had to drop many RSS feeds because those sites fell prey to TB spam (notably the TopicExchange) -- it seems that all content bound for unmoderated auto-inclusion into a Drupal should at the very least pass through the spam.module list of blacklisted regexp, including comments, TB, privatemsg, and aggregator items.

This problem is not going to go away, and it's only going to get worse. In the eyes of the spammers, we invite their traffic by allowing open public write permissions and they have no ethical qualms about exploiting that opportunity (for $40,000/month), no more than those who plaster public walls with concert and CD-release posters. Best we can do is tighten our notion of the federated website, building drupal to intrinsically expect a real world where not every web user is your friend. This means every write-action is suspect and must be gated, and every module must be made aware of the federated-access publishing model. I also suspect it will soon be necessary to build the ACCESS/DENY test into the Drupal bootstrap code, blocking any and all offending traffic as quickly as possible to conserve our server resources. The Apache server has this built-in (where ISPs allow DENY rules in .htaccess) but it might still be useful to have one standard defense across all Drupal platforms where all incoming requests are compared to an easily maintainable list of IP patterns.

escoles’s picture

Title: Anti-spam capability » I believe there's a simpler solution:

Make trackback sending fully manual, and trackback receipt moderated.

Auto-discovery of trackback was a mistake to begin with -- it should never have been conceived, much less implemented, because it defeats the purpose of trackback. Unless I'm missing something, if trackback is made manual, and limited to stories, it should eliminate trackback spam.

And trackbacks really should never have been implemented as comments. They're qualitatively different from comments, and shouldn't be lumped in with them.

All of the problems with trackback spam source back to automation. Eliminate or constrain the automation, and trackback is no longer a cost-effective target for spammers.

Junyor’s picture

Title: I believe there's a simpler solution: » Anti-spam capability
puregin’s picture

The TrackBack Patch for spam.module works well for me (4.5.2) - thanks! Djun

ankur’s picture

Assigned: Unassigned » ankur
Priority: Normal » Critical

Will post a 4.6 update that answers the spam problem. It won't include whitelists or blacklists to determine which sites can send or can't send trackbacks, but that may be something to include down the road. It is already possible to set up blacklists with the spam module, though this would mean installing the spam module on your site, which is better than spending coding time coding anti-spam capability directly into trackback.... In anycase, the update will include a non-spam-module dependent configuration option for moderating trackbacks as well as a configuration option for making trackback work with the spam module much the same way comments work with the spam module.

-Ankur

ankur’s picture

The trackback module in CVS HEAD has been updated for 4.6 and now includes spam module support as well as a non-spam module dependent moderation feature.

The module will be tagged, shortly -- pending some testing, for 4.6.

It doesn't address every request made in this issue, but offers trackback moderation minus the spam module's functionality as well as the option of using the spam module to make Drupal learn about and (configurably) unpublish trackbacks automatically.

Also, with the new spam hook (meaning the spam module must be installed as well), it is possible to create a blacklist of IPs and domain names from which trackbacks will automatically be unpublished (or just marked as spam depending on the configuration).

Anyone interested in testing this should do so on a non-production site for now. Preliminary testing has been done and it appears to work well for 4.6.

Marking this issue as fixed. If there is a bug in the current solution, submit it to this issue and change the issue from "fixed" to "active". If you would like to see an additional moderation feature or some other alternative scheme, please file those as seperate issues marked as "feature requests".

-Ankur Rishi

ankur’s picture

Anonymous’s picture

debwire’s picture

Version: » 4.6.x-1.x-dev
Category: feature » support

I have now disabled this module because I have a ton of trackback spam with no means to mass delete. Are these trackback spam messages somewhere in the mysql database? If so, where and how may I go about directly removing them from the database?

FlemmingLeer’s picture

I posted this one, but it seems that it got overlooked.

Stop trackback from occuring in the first place via a javascript.
http://drupal.org/node/29792

It seems that the programs making the trackback spams cannot use this javascript to get to the trackback url in the first place and thus server load is not overloaded.

Is it a viable option ?