This is another mode of SPAM defense. What I'm noticing in trackback SPAM is often the trackback request is from a page that does not link to us. e.g. the trackback will be to some casino home page or whatnot.
This is actually very simple to implement. In "trackback_receive", retrieve the page in $_REQUEST["url"] and perform some checks. If the checks pass, then allow the trackback, otherwise return an error.
Here's the code I wrote. It sits in "trackback_receive" within the "if" that learns $trackback->url is a valid url. I have extra watchdog calls for debug/tracing purposes. Further it's within "if(true){}" so I can disable the code easily.
$error = 0;
if (true) {
$response = drupal_http_request($trackback->url);
// Check $response->error to see if the request happened right
// Look in $response->data for our URL
if ($response->error) {
$error = 1;
$message = t('Could not retrieve requesting page');
watchdog('trackback', t('Could not retrieve requesting page %url', array('%url' => $trackback->url)),
WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
} else {
$node_url = url('node/'. $node->nid, NULL, NULL, TRUE);
$found = stristr($response->data, $node_url);
if ($found === false) {
$error = 1;
$message = t('Page requesting trackback does not refer to '.$node_url);
watchdog('trackback',
t('Page requesting trackback does not refer to %nodeurl, source is %url',
array('%url' => $trackback->url, '%nodeurl' => $node_url
)),
WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
} else {
watchdog('trackback',
t('Page requesting trackback DOES refer to %nodeurl, source is %url -- WILL RECORD TRACKBACK',
array('%url' => $trackback->url, '%nodeurl' => $node_url
)),
WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid)
);
}
}
}
The other change to make is surrounding the current code to creates the trackback record and save it in the database ... do this:
if ($error == 0) {
$trackback->trid = db_next_id('{trackback_received}_trid');
$trackback->nid = $node->nid;
...
}
| Comment | File | Size | Author |
|---|---|---|---|
| #7 | diff_29 | 5.09 KB | reikiman |
| #6 | trackback_2.module | 50.68 KB | reikiman |
| #4 | trackback_1.module | 50.68 KB | reikiman |
Comments
Comment #1
reikiman commentedSee this forum thread: http://drupal.org/node/45905
Comment #2
reikiman commentedComment #3
nathandigriz commentedI would like to test this but there is no actual patch posted. Did you forget? Where in the function are you placing your code?
Comment #4
reikiman commentedSorry, it took me awhile to get to this.
This patch is actuallly against the trackback.module for 4.7 and it's the same function I used in 4.6.
Comment #5
beginner commentedThe idea is interesting.
Can you supply a proper patch?
http://drupal.org/patch
Comment #6
reikiman commentedOkaaaaay.... Note, the CVS version numbers are different in my CVS repository than in the standard one.
And, for the heck of it, a straight copy of my trackback.module.
Comment #7
reikiman commentedWell, shoot, drupal sure barfed on that patch. So let's attach rather than include it inline.
Comment #8
beginner commentedHas the function changed that much?
Can you remove everything between /*OLD and END OLD */?
Also, can you provide a patch for cvs? No feature is ever added on a stable release.
Comment #9
beginner commentedand maybe a patch against the root of Drupal's cvs repository, instead of yours, would be appreciated.
Comment #10
reikiman commentedThis is beginning to get on my nerves with all these requirements being tossed at me one at a time. First it's to use this strange -u diff format (-c format is immensely better) and now you want me to set up a new version of drupal??? Okay... breath .. breath ...
I do see the trackback.module at the CVS head is rather different, especially the trackback_receive function.
Yes, everything between the OLD and END OLD can be removed. Can't you just do that as you edit the module? er... breath.. breath...
Yes, the change is really very minor. It's to do a drupal_http_request on the requestors URL, check for result status, and then conditionally execute the existing code if the status indicates appropriate conditions. The initial report on this RFE describes the testing process and change.
But at the same time I do not have drupal set up for the CVS head ... I only have drupal set up for 4.7.
Here is an attempt to edit my patch into the function I see in the CVS head. I am not willing to set up a drupal instance of the CVS Head just to test this out. This is the third version of this patch I've provided where each time I'm meeting a brick wall.
Comment #11
reikiman commentedFWIW, I just did a diff between the CVS HEAD version of trackback.module and the one I have. There are no changes other than the function I am proposing here.
Also I noticed what ought to be a bug in the function I'm proposing. I noticed that I forgot to copy the first line of the original function into my function:
This should, of course, be the first line of the function.
You should be able to take the trackback.module that I've already attached to this RFE, and use it directly with these two changes: 1) add the above line to the beginning of trackback_receive, b) as you noted, remove the part between OLD and END OLD.
To verify what I'm saying you should be able to diff yourself and see that the only change is to replace the trackback_receive function.
Comment #12
dkruglyak commented+1. The idea is quite obvious and I am surprised the patch did not get in as a module option... Being bombarded with spam trackbacks that do not link back is a huge problem.
This is needed in both 4.7.x and 5.x
Comment #13
dkruglyak commentedI made a fix that is more compact and up-to-date than the one posted above.
First this new function (using curl library) fetches a webpage and checks whether it contains base_url
Then trackback_receive is updated to include another check to silently drop trackbacks that do not link back
This worked wonders on my site that came under siege of spammers flooding me with thousands of trackbacks a day. Ready to commit. The only possible enhancement is to make the extra check optional, based on user setting.
Comment #14
dordal commentedI just tried this patch and can confirm that it works as advertised. Thanks, dkruglyak!
Comment #15
reikiman commentedThis looks like a good improvement. In my case I've turned off trackback on my sites, even with the patch I submitted at the top of this page, because it sometimes seemed to make my site to into a long wait for some pages. Having a timeout is important for that reason.
One tweak I'd like to see is (optional) logging of trackback failures. This way we could use those failures to turn on filters at a higher level such as the troll module or in server firewall rules. e.g. if some server is repeatedly making spammy trackback attempts I'd rather just block them from getting to the site in the first place. As it is, because they're free to make as many trackback requests as they want, they are free to tie up my server resources.
Comment #16
gnassar commentedGoing with Drupal's http request code probably beats adding a dependency to curl in the long run. And of course, there's still no patch file uploaded. Demoting from RTBC.
As an alternative, Stefan, the creator of the Multiping module, has a pretty nicely rolled-up separate module to handle this check, along with some other "sanity checks:"
http://stefan.ploing.de/linux/drupal
Sadly, it still requires a small patch to the trackback module to work. But happily, the .patch file is included in the roll-up.
Long-term, perhaps putting a place for other modules to hook into within trackback.module would probably be the best solution for this; then others can supply custom spam-blocking modules to their heart's delight.
Comment #17
zorac commentedNew release 1.3 includes this feature as "Reject one-way trackbacks" option.
Comment #18
(not verified) commented