Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
I'm sure you've read about it already, but I just wanted to register it here. I've been getting a lot of trackback spam about texas holdem poker recently. Would it be possible to get trackback.module to play with spam.module so that some sort of filtering mechanism can be set up?
Alternately, a procedure that hunts down the spammer and beats him with sticks would be just as satisfying.
--|PW|--
Comment | File | Size | Author |
---|---|---|---|
#31 | trackback.module_1.patch | 2.71 KB | Jeremy |
#1 | trackback.module_0.patch | 1.66 KB | Jeremy |
Comments
Comment #1
Jeremy CreditAttribution: Jeremy commentedThe attached patch implements the _spam hook for the trackback module, allowing trackbacks to be run through the spam filter.
A few comments:
- if the spam module is not installed, not enabled, or too old, the trackback module will work normally
- adds 'filter tracbacks' option to spam _settings page, enabled by default
- requires latest version of the spam module, as updated today
Comment #2
(not verified) CreditAttribution: commentedI'm a total PHP ignoramus, so could you tell me how to implement this patch without breaking my blog?
Comment #3
Jeremy CreditAttribution: Jeremy commentedIf you're completely unfamiliar with PHP and applying patches, I recommend that you wait for others to test it, and for it to hopefully be merged.
If you're willing to figure it out, do a search for "patch" on this site to find some discussion on how to apply patches.
Comment #4
freyquency CreditAttribution: freyquency commentedhttp://drupal.org/node/14231
Comment #5
freyquency CreditAttribution: freyquency commentedSorry, forgot it would change the title of the whole thread. I forget what the name was exactly but it was something like this.
Comment #6
(not verified) CreditAttribution: commentedRestoring
Comment #7
media girl CreditAttribution: media girl commentedThank you for getting on top of this. Already many people are giving up on trackback altogether because of the spam. It seems Drupal might be leading the charge in getting proper filtering into trackbacks.
But I have a question:
When you say the latest version of the Spam module available today, do you mean the one on the downloads page, dated 2004? Or do you mean the CVS latest here http://drupal.org/project/cvs/11104 ? I'm sorry, but I'm not up on the workflow here. The changelog indicates updates, but the date of the download file on the downloads page is unchanged, so I'm not sure to which you refer ... or whether the CVS spam head will work on 4.5.
Any clues? Thanks!
Comment #8
media girl CreditAttribution: media girl commentedFor reference for others, the latest version of the spam module is NOT the one available from the downloads page (as of this moment), but is rather here: http://drupal.org/files/projects/spam-4.5.0.tar.gz .
Comment #9
Jeremy CreditAttribution: Jeremy commentedIf I remember correctly, tarballs are now built every 6 hours. So, within 6 hours of me uploading changes into CVS, the complete tarball should be up-to-date. To be fully confident you have the latest release you could compare what you downloaded with what you view here (comparing the $Id tag at the top of the files).
To view what changes have been made recently (and when), take a look here.
Comment #10
ankur CreditAttribution: ankur commentedAs of the latest few patches for the 4.5 trackback module currently in the CVS (http://cvs.drupal.org/viewcvs/drupal/contributions/modules/trackback/tra...), users can now delete trackbacks as they wish... however, having an automatic spam filter may be a little while in coming. A more quickly implementable solution is having moderated trackbacks... I'm ready to look into it given what the feedback is here...
-Ankur
Comment #11
Junyor CreditAttribution: Junyor commentedAnkur: Is there a problem with Jeremy's patch or am I misunderstanding the functionality it adds?
Comment #12
Jeremy CreditAttribution: Jeremy commentedThe purpose of my patch (see #1 above) is to utilize the spam.module's _spam hook and "spam_check" function to provide comprehensive spam filtering for the trackback module. When applied, all new trackback's will be run through the spam module's Bayesian filter, URL filters, custom filters, etc...
Testers appreciated.
Comment #13
DocSavage CreditAttribution: DocSavage commentedThanks Jeremy. I replaced my band-aid with your patch on trackbacks, then subbed in your updated spam module. It's running now, no errors. Will let you know if there's any problems.
-Bill
Comment #14
DocSavage CreditAttribution: DocSavage commentedJeremy, when adding some new regex filters, I noticed a small error in the new spam module I downloaded from Drupal Download area. Issue opened and fix given: http://drupal.org/node/17177
I'm new here, so I hope this is the proper protocol so we can keep track of bug fixes. Let me know if I'm supposed to patch things I spot directly. Regards.
Comment #15
Jeremy CreditAttribution: Jeremy commentedThanks Doc, I merged your fix (missing %d) into the spam module. Indeed, if you find any additional problems with the spam module open additional spam module issues. A patch is not necessary when it's as simple a change as the last.
Is the spam module now catching spam trackbacks for you?
Comment #16
DocSavage CreditAttribution: DocSavage commentedWell, I'm still golden... no trackback spams so far and I do see a few spammers coming by in the logs. Looks like it works.
Comment #17
DocSavage CreditAttribution: DocSavage commentedGot hit with some trackbacks today, which seemed like they should have been picked up by the spam filter. Much of the spam is getting caught. Here's one that made it onto the trackback for one of my book signing events:
In my url filters, I have entries for:
phentermine
poker
...
In my custom filters, I have regex filters for:
/phentermine/i (always spam)
/poker/i (usually spam)
...
For the URL filters, does it have to be exact URL match or is this a matching thing? In any case, the custom filters should have definitely blasted it because of "phentermine"
I'm going to readd my patch, in addition to yours.
Comment #18
Jeremy CreditAttribution: Jeremy commentedCan you dump your custom filter table, and attach it here? Also, can you attach the complete un-edited trackback(s) that got through the filter? Either there's a bug, or there's a logical explanation... ;)
Thanks.
BTW: URL filters match on the root. ie, if the url is http://www.sample.org/this/is/the/url, the URL filter sees "www.sample.org". This means that http://mail.sample.org/another/url would be different, as the URL filter would see "mail.sample.org". This was by design, but in retrospect I'm thinking it's more logical that both of these should simply be seen as "sample.org". But that's an unrelated issue. In your case, I only care about the custom filters, to understand why it didn't block your trackback spam.
Comment #19
(not verified) CreditAttribution: commentedHere's cut & paste on my custom filters:
Custom filtersfilter type effect delete matches last match operations
/casino/i regex usually spam disabled 0 Dec 31 1969 - 19:00 edit delete
/phentermine/i regex always spam disabled 17 Feb 12 2005 - 02:03 edit delete
/poker/i regex usually spam disabled 34 Feb 12 2005 - 02:24 edit delete
/psxtreme/i regex always spam disabled 0 Dec 31 1969 - 19:00 edit delete
Let me know if you need it in some other format. Unfortunately, I've already deleted all the trackback entries from my DB, so you'll have to work with the paste of the trackback above. You can see that the filters have nailed at least 34 spam comments, but I'm not sure if it nailed any trackbacks.
I think trackbacks must not be going through the spam filter. I do have spam trackback filtering ON. The "/phentermine/i" regex should have blasted the trackback content since anything with it should be marked spam. Have could do a simple test where you trackback to yourself using some of the spam words matched by a custom regex filter?
I have to revise my URL filters given what you said.
Comment #20
(not verified) CreditAttribution: commentedHere's cut & paste on my custom filters:
Custom filtersfilter type effect delete matches last match operations
/casino/i regex usually spam disabled 0 Dec 31 1969 - 19:00 edit delete
/phentermine/i regex always spam disabled 17 Feb 12 2005 - 02:03 edit delete
/poker/i regex usually spam disabled 34 Feb 12 2005 - 02:24 edit delete
/psxtreme/i regex always spam disabled 0 Dec 31 1969 - 19:00 edit delete
Let me know if you need it in some other format. Unfortunately, I've already deleted all the trackback entries from my DB, so you'll have to work with the paste of the trackback above. You can see that the filters have nailed at least 34 spam comments, but I'm not sure if it nailed any trackbacks.
I think trackbacks must not be going through the spam filter. I do have spam trackback filtering ON. The "/phentermine/i" regex should have blasted the trackback content since anything with it should be marked spam. Have could do a simple test where you trackback to yourself using some of the spam words matched by a custom regex filter?
I have to revise my URL filters given what you said.
Comment #21
Jeremy CreditAttribution: Jeremy commentedOf course, this is how I tested the patch when I first wrote it. On my devel server it was catching trackback spam perfectly. I'll reinstall and reapply the patch and test again, to be sure I attached the proper patch. You are using the latest version of the trackback module, correct? (My spam patch should have applied without errors)
The advantage to URL filters is that they are auto-added. If you're manually adding them, you should probably just be adding them in the form of custom filters instead, as there is more power available with custom filters.
Comment #22
DocSavage CreditAttribution: DocSavage commentedI figured you tested trackbacks on your devel server; I guess I should have been more specific in asking if you've tried the exact word/regex combo: "/phentermine/i". Shouldn't matter, but the first step is just seeing if we can reproduce the error. I used the most recent trackback module under Drupal downloads, but this was before they updated it in CVS to handle the trackback deletions. By most recent trackback module, are you talking about the very latest, since I thought they modified it after you submitted your patch.
Comment #23
ankur CreditAttribution: ankur commentedJeremy: I'm sorry. I did not see the patch file attachment the last time I was here. I saw the word spam and spam module and thought "project" for another day. But hey, you already did the project. Thanks a bunch.
I'm ready to commit this patch, but I have to check in with the team at CivicSpace Labs. Trackback is currently being maintained by CivicSpace (http://www.civicspacelabs.org) and any changes to the code go through the lead developer, which I'm not. I just bumped the patch file to him. It should be OK save for the fact that CivicSpace (a distribution of Drupal) does not ship with the spam module.
In any case, stay tuned. I'll keep you posted. But once again, thanks for your efforts. It is much appreciated.
-Ankur
Comment #24
Jeremy CreditAttribution: Jeremy commentedIf no spam module is installed, trackback will function normally even with my patch applied.
Comment #25
media girl CreditAttribution: media girl commentedI was unable to run this patch on the current trackback version (1.14.2.1) -- I got this error:
I'm in the midst of a trackback spam attack, so I will report if I'm meeting with success anyway.
Comment #26
media girl CreditAttribution: media girl commentedIt works. On the tracker, it shows the thread as updated (which may be a tracker issue), but the spam trackback was successfully filtered. Woohoo!
Comment #27
media girl CreditAttribution: media girl commentedAm I out of line to re-open this? As noted above, I ran the patch, but there remains a problem.
Yes, the trackback spam is not showing up under comments, but under the trackbacks tab on the node, the spam remains. The only option at this point is to delete it.
I am not the only one experiencing this, as it was pointed out to me by someone else on their site. Any ideas?
Comment #28
(not verified) CreditAttribution: commentedLet me toss in my experiences at http://www.pennywit.com. I installed the new spam.module a couple weeks ago and patched trackback.module over the weekend. If you want to see Media Girl's narration in action, look here:
http://www.pennywit.com/drupal/node/1787/
Check under "comments," then look under the "trackbacks" tab.
--|PW|--
Comment #29
media girl CreditAttribution: media girl commentedMore info:
On all trackbacks, I have this error in the logs:
user error: Unknown column 'autodelete' in 'field list' query: SELECT scid, filter, regex, effect, autodelete FROM spam_custom in /[path]l/mediagirl.org/includes/database.mysql.inc on line 125.
The spam trackbacks are getting unpublished, according to the comments list, but they are still visible via the trackback tabs. For example, the online poker trackback visible on this page is "unpublished" according to the spam list under comments -- yet there it is to see for anyone who clicks on the trackback tab on the node. I'm getting these one an hour for the past two days now.
If anyone has any ideas, I welcome them. As I noted above, the patch did not fully take on my module, and perhaps there is a line of code I need to go in and hack. Help!
Comment #30
Jeremy CreditAttribution: Jeremy commentedReview the spam.module changelog. Note that a recent change requires that you execute the following line to update your database:
ALTER TABLE spam_custom ADD autodelete tinyint(1) unsigned default 0;
I don't have time to look at this at the moment, but it sounds like the trackback module is ignoring the comment status, displaying it regardless of if it's published or not. That would be a bug, fixed by modifying the db_query that loads the trackbacks.
It does say that it was successfully applied. The offset is probably due to recent changes in the trackback module, but it will explicitly tell you if it was unable to apply the patch. Look for *.rej files -- if there are any, you need to manually apply them. (You can open *.rej "reject" files in a text editor and see line by line what it was trying to change and do so manually)
Comment #31
Jeremy CreditAttribution: Jeremy commentedHere's an updated version of the patch. It's identical to my earlier patch except that it also updates the db_query that loads the trackbacks to only load them if they're published.... (this is untested)
Comment #32
media girl CreditAttribution: media girl commentedOkay, I ran the db table update on the spam module table as indicated (which is not very well documented, I must say, very easy to miss). Next, applied the new patch. I got the same error in applying the patch, but a spot check in the revised module tells me that the changes were made. At least I think so.
A spot check on the site shows that now the trackback spam will not show under the trackback tab.
Success! Thank you!
Comment #33
drummPlease commit this to the HEAD branch. (I have informal commit permission for this module from Kjarten.) I'm fine with either jeremy or ankur doing this.
Comment #34
garym@teledyn.com CreditAttribution: garym@teledyn.com commentedI do applaud the work being done here, and I cheer the rapid response with which the Drupal community has addressed this issue, but I'm still not turning TB back on -- the filter approach does catch spam, but it is too naive to be effective as the only solution.
Let me put this into perspective with this link to full instructions for creating your own TB-spamming module: the spam process is completely automated by 20 lines of PHP code -- according to The Register, an average link-spammer can reap more revenue in a month than most of us make in a year, and they react to our setting up filtre hurdles by simply upping the attack frequency.
So what can we do? Filtres are ok for casual spam and loose-cannon flame-bait posters on your site, but those are only minor annoyances compared to the professional link-spammer. I think the general rule is that any given blog only sees trackbacks from a small and nearly-finite list of blogs in that social bubble, so what I'd like to see is an option for gated TB access to my site, a whitelist policy where only approved/federated sources may submit directly to my content pages.
In the simplest form, we match a list of regex to the TB url field and apply Apache-like ACCEPT/DENY rules, spooling denied TB posts for manual review where I can easily whitelist any new entries: If I've seen your URL before and have approved you, your TB is accepted, otherwise it's held for review -- even those accepted from known URLs could then be passed through the filtering stage as a simple protection against spoofing, because it won't take the spammers long to realize they can give any URL they like and still put their trash links into the excerpt field.
That last issue, spoofing the TB headers, leads to the reason why I call the above spec the 'simple' model: As it sits, TB does not record the true source of the message; the TB model makes a tacit and false assumption that everyone will represent themselves honestly. A more thorough whitelisting system would need to record the REMOTE_HOST originating IP address and hope that we can build reliable blocks for spammers on dialup and those leveraging unaware compromised windows machines.
It's worth noting here, since this is a general thread on Spam strategies, that I have already seen a spammer site with the foresight to run their own Drupal server thereby gaining unrestricted authenticated member priviledges on all other Drupal sites that enable the distributed logins. This means our login system also needs a whitelist system to grant or deny remote authenticating drupal servers based on hostname patterns. Also in this thread of spam protections, I have had to drop many RSS feeds because those sites fell prey to TB spam (notably the TopicExchange) -- it seems that all content bound for unmoderated auto-inclusion into a Drupal should at the very least pass through the spam.module list of blacklisted regexp, including comments, TB, privatemsg, and aggregator items.
This problem is not going to go away, and it's only going to get worse. In the eyes of the spammers, we invite their traffic by allowing open public write permissions and they have no ethical qualms about exploiting that opportunity (for $40,000/month), no more than those who plaster public walls with concert and CD-release posters. Best we can do is tighten our notion of the federated website, building drupal to intrinsically expect a real world where not every web user is your friend. This means every write-action is suspect and must be gated, and every module must be made aware of the federated-access publishing model. I also suspect it will soon be necessary to build the ACCESS/DENY test into the Drupal bootstrap code, blocking any and all offending traffic as quickly as possible to conserve our server resources. The Apache server has this built-in (where ISPs allow DENY rules in .htaccess) but it might still be useful to have one standard defense across all Drupal platforms where all incoming requests are compared to an easily maintainable list of IP patterns.
Comment #35
escoles CreditAttribution: escoles commentedMake trackback sending fully manual, and trackback receipt moderated.
Auto-discovery of trackback was a mistake to begin with -- it should never have been conceived, much less implemented, because it defeats the purpose of trackback. Unless I'm missing something, if trackback is made manual, and limited to stories, it should eliminate trackback spam.
And trackbacks really should never have been implemented as comments. They're qualitatively different from comments, and shouldn't be lumped in with them.
All of the problems with trackback spam source back to automation. Eliminate or constrain the automation, and trackback is no longer a cost-effective target for spammers.
Comment #36
Junyor CreditAttribution: Junyor commentedComment #37
puregin CreditAttribution: puregin commentedThe TrackBack Patch for spam.module works well for me (4.5.2) - thanks! Djun
Comment #38
ankur CreditAttribution: ankur commentedWill post a 4.6 update that answers the spam problem. It won't include whitelists or blacklists to determine which sites can send or can't send trackbacks, but that may be something to include down the road. It is already possible to set up blacklists with the spam module, though this would mean installing the spam module on your site, which is better than spending coding time coding anti-spam capability directly into trackback.... In anycase, the update will include a non-spam-module dependent configuration option for moderating trackbacks as well as a configuration option for making trackback work with the spam module much the same way comments work with the spam module.
-Ankur
Comment #39
ankur CreditAttribution: ankur commentedThe trackback module in CVS HEAD has been updated for 4.6 and now includes spam module support as well as a non-spam module dependent moderation feature.
The module will be tagged, shortly -- pending some testing, for 4.6.
It doesn't address every request made in this issue, but offers trackback moderation minus the spam module's functionality as well as the option of using the spam module to make Drupal learn about and (configurably) unpublish trackbacks automatically.
Also, with the new spam hook (meaning the spam module must be installed as well), it is possible to create a blacklist of IPs and domain names from which trackbacks will automatically be unpublished (or just marked as spam depending on the configuration).
Anyone interested in testing this should do so on a non-production site for now. Preliminary testing has been done and it appears to work well for 4.6.
Marking this issue as fixed. If there is a bug in the current solution, submit it to this issue and change the issue from "fixed" to "active". If you would like to see an additional moderation feature or some other alternative scheme, please file those as seperate issues marked as "feature requests".
-Ankur Rishi
Comment #40
ankur CreditAttribution: ankur commentedComment #41
(not verified) CreditAttribution: commentedComment #42
debwire CreditAttribution: debwire commentedI have now disabled this module because I have a ton of trackback spam with no means to mass delete. Are these trackback spam messages somewhere in the mysql database? If so, where and how may I go about directly removing them from the database?
Comment #43
FlemmingLeer CreditAttribution: FlemmingLeer commentedI posted this one, but it seems that it got overlooked.
Stop trackback from occuring in the first place via a javascript.
http://drupal.org/node/29792
It seems that the programs making the trackback spams cannot use this javascript to get to the trackback url in the first place and thus server load is not overloaded.
Is it a viable option ?