This is another mode of SPAM defense. What I'm noticing in trackback SPAM is often the trackback request is from a page that does not link to us. e.g. the trackback will be to some casino home page or whatnot.

This is actually very simple to implement. In "trackback_receive", retrieve the page in $_REQUEST["url"] and perform some checks. If the checks pass, then allow the trackback, otherwise return an error.

Here's the code I wrote. It sits in "trackback_receive" within the "if" that learns $trackback->url is a valid url. I have extra watchdog calls for debug/tracing purposes. Further it's within "if(true){}" so I can disable the code easily.

    $error = 0;

    if (true) {
      $response = drupal_http_request($trackback->url);
      
      // Check $response->error to see if the request happened right
      // Look in $response->data for our URL
      
      if ($response->error) {
        $error = 1;
	$message = t('Could not retrieve requesting page');
	watchdog('trackback', t('Could not retrieve requesting page %url', array('%url' => $trackback->url)),
	    WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
      } else {
        $node_url = url('node/'. $node->nid, NULL, NULL, TRUE);
	$found = stristr($response->data, $node_url);
	if ($found === false) {
	  $error = 1;
	  $message = t('Page requesting trackback does not refer to '.$node_url);
	  watchdog('trackback', 
	    t('Page requesting trackback does not refer to %nodeurl, source is %url',
	      array('%url' => $trackback->url, '%nodeurl' => $node_url
	    )),
	    WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
	} else {
	  watchdog('trackback',
	    t('Page requesting trackback DOES refer to %nodeurl, source is %url -- WILL RECORD TRACKBACK',
	      array('%url' => $trackback->url, '%nodeurl' => $node_url
	    )),
	    WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid)
	  );
	}
      }
    }

The other change to make is surrounding the current code to creates the trackback record and save it in the database ... do this:

  if ($error == 0) {
      $trackback->trid = db_next_id('{trackback_received}_trid');
      $trackback->nid = $node->nid;
      ...
  }
CommentFileSizeAuthor
#7 diff_295.09 KBreikiman
#6 trackback_2.module50.68 KBreikiman
#4 trackback_1.module50.68 KBreikiman

Comments

reikiman’s picture

See this forum thread: http://drupal.org/node/45905

reikiman’s picture

Status: Active » Needs review
nathandigriz’s picture

I would like to test this but there is no actual patch posted. Did you forget? Where in the function are you placing your code?

reikiman’s picture

Version: 4.6.x-1.x-dev » 4.7.x-1.x-dev
StatusFileSize
new50.68 KB

Sorry, it took me awhile to get to this.

This patch is actuallly against the trackback.module for 4.7 and it's the same function I used in 4.6.

***************
*** 24,30 ****
--- 24,115 ----
    }
  }
  
+ function trackback_receive(&$node) {
+   // Process TrackBack post data.
+   $trackback->url = check_url($_REQUEST['url']);
+   if ($trackback->url && valid_url($_REQUEST['url'], TRUE)) {
+ 
+     $error = 0;
+ 
+     if (true) {
+       $response = drupal_http_request($trackback->url);
+ 
+       // Check $response->error to see if the request happened right
+       // Look in $response->data for our URL
+ 
+       if ($response->error) {
+         $error = 1;
+         $message = t('Could not retrieve requesting page');
+         watchdog('trackback', t('Could not retrieve requesting page %url', array('%url' => $trackback->url)),            WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
+       } else {
+         $node_url = url('node/'. $node->nid, NULL, NULL, TRUE);
+         $found = stristr($response->data, $node_url);
+         if ($found === false) {
+           $error = 1;
+           $message = t('Page requesting trackback does not refer to '.$node_url);
+           watchdog('trackback',
+             t('Page requesting trackback does not refer to %nodeurl, source is %url',
+               array('%url' => $trackback->url, '%nodeurl' => $node_url
+             )),
+             WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
+         } else {
+           watchdog('trackback',
+             t('Page requesting trackback DOES refer to %nodeurl, source is %url -- WILL RECORD TRACKBACK',
+               array('%url' => $trackback->url, '%nodeurl' => $node_url
+             )),
+             WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid)
+           );
+         }
+       }
+     }
+ 
+     if ($error == 0) {
+       $trackback->trid = db_next_id('{trackback_received}_trid');
+       $trackback->nid = $node->nid;
+       $trackback->created = time();
+       $trackback->site = $_SERVER['REMOTE_ADDR'];
+       $trackback->name = strip_tags(($_REQUEST['blog_name']) ? $_REQUEST['blog_name'] : $trackback->url);
+       $trackback->subject = strip_tags(($_REQUEST['title']) ? $_REQUEST['title'] : $trackback->url);
+       // $trackback->url already set above.  Though I might say something here since I'm setting the fields
+       // in the exact same order that they are created in the table's create statement (with this exception).      $trackback->excerpt = (strlen($_REQUEST['excerpt'] > 255) ? truncate_utf8($_REQUEST['excerpt'], 252) .'...' : $_REQUEST['excerpt']);
+       $trackback->status = (variable_get('trackback_moderation', 0) == 0) ? 1 : 0;
+       //watchdog('trackback', t('trackback: added \'%subject\'', array('%subject' => $trackback->subject)), l(t('view trackback'), 'node/'. $node->nid, NULL, NULL, 'trackback-'. $trackback->trid));
+ 
+       watchdog('trackback', t('trackback: added \'%subject\'', array('%subject' => $trackback->subject)), WATCHDOG_NOTICE, l(t('view trackback'), 'node/'. $node->nid .'#trackback-'. $trackback->trid));
+ 
+       db_query("INSERT INTO {trackback_received} (trid, nid, created, site, name, subject, url, excerpt, status) VALUES (%d, %d, %d, '%s', '%s', '%s', '%s', '%s', %d)", $trackback->trid, $trackback->nid, $trackback->created, $trackback->site, $trackback->name, $trackback->subject, $trackback->url, $trackback->excerpt, $trackback->status);
+ 
+       if (function_exists('spam_check') && variable_get('trackback_spam_filter', 1)) {
+         // trid, subject and body are used by spam_check()
+         // Put everything we want to check for spam in the trackback body and subject
+         // Since everything has already been inserted into the database and since the value
+         // of the $trackback variable's attributes will not be used again in this function,
+         // we can modify the attributes as we want before passing it on to the spam check.
+         $trackback->subject = $trackback->subject .' '. $trackback->url;
+         spam_check($trackback, 'excerpt', 'subject', 'trackback_spam_actions', 'insert');
+       }
+     }
+   }
+   else {
+     $error = 1;
+     $message = t('Missing TrackBack url.');
+   }
+ 
+   // Generate response
+   $output = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
+   $output .= "<response>\n";
+   $output .= '<error>'. $error ."</error>\n";
+   $message and $output .= '<message>'. $message ."</message>\n";
+   $output .= "</response>\n";
+ 
+   return $output;
+ }
+ 
+ 
+ 
+ 
  
+ /*OLD
  function trackback_receive(&$node) {
    $trackback = new stdClass();
    // Process TrackBack post data.
***************
*** 71,76 ****
--- 156,162 ----
  
    return $output;
  }
+ END OLD */
  
  function theme_trackbacks($trackbacks) {
    $output = '<div id="trackbacks">'."\n";
beginner’s picture

Status: Needs review » Needs work

The idea is interesting.

Can you supply a proper patch?
http://drupal.org/patch

reikiman’s picture

StatusFileSize
new50.68 KB

Okaaaaay.... Note, the CVS version numbers are different in my CVS repository than in the standard one.

And, for the heck of it, a straight copy of my trackback.module.

[tippy:drupal/modules/trackback] davidher% cvs diff -r1.2 -r1.3 -u -F^f trackback.module
Index: trackback.module
===================================================================
RCS file: /var/www/vhosts/davidherron.com/cvsroot/drupal/modules/trackback/trackback.module,v
retrieving revision 1.2
retrieving revision 1.3
diff -u -F^f -r1.2 -r1.3
--- trackback.module    15 Jun 2006 04:48:28 -0000      1.2
+++ trackback.module    20 Jun 2006 03:38:40 -0000      1.3
@@ -1,5 +1,5 @@
 <?php
-// $Id: trackback.module,v 1.2 2006/06/15 04:48:28 david Exp $
+// $Id: trackback.module,v 1.3 2006/06/20 03:38:40 david Exp $
 
 /**
  * Implementation of hook_help().
@@ -24,7 +24,92 @@ function trackback_help($section) {
   }
 }
 
+function trackback_receive(&$node) {
+  // Process TrackBack post data.
+  $trackback->url = check_url($_REQUEST['url']);
+  if ($trackback->url && valid_url($_REQUEST['url'], TRUE)) {
+
+    $error = 0;
+
+    if (true) {
+      $response = drupal_http_request($trackback->url);
+
+      // Check $response->error to see if the request happened right
+      // Look in $response->data for our URL
+
+      if ($response->error) {
+        $error = 1;
+        $message = t('Could not retrieve requesting page');
+        watchdog('trackback', t('Could not retrieve requesting page %url', array('%url' => $trackback->url)),            WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
+      } else {
+        $node_url = url('node/'. $node->nid, NULL, NULL, TRUE);
+        $found = stristr($response->data, $node_url);
+        if ($found === false) {
+          $error = 1;
+          $message = t('Page requesting trackback does not refer to '.$node_url);
+          watchdog('trackback',
+            t('Page requesting trackback does not refer to %nodeurl, source is %url',
+              array('%url' => $trackback->url, '%nodeurl' => $node_url
+            )),
+            WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
+        } else {
+          watchdog('trackback',
+            t('Page requesting trackback DOES refer to %nodeurl, source is %url -- WILL RECORD TRACKBACK',
+              array('%url' => $trackback->url, '%nodeurl' => $node_url
+            )),
+            WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid)
+          );
+        }
+      }
+    }
+
+    if ($error == 0) {
+      $trackback->trid = db_next_id('{trackback_received}_trid');
+      $trackback->nid = $node->nid;
+      $trackback->created = time();
+      $trackback->site = $_SERVER['REMOTE_ADDR'];
+      $trackback->name = strip_tags(($_REQUEST['blog_name']) ? $_REQUEST['blog_name'] : $trackback->url);
+      $trackback->subject = strip_tags(($_REQUEST['title']) ? $_REQUEST['title'] : $trackback->url);
+      // $trackback->url already set above.  Though I might say something here since I'm setting the fields
+      // in the exact same order that they are created in the table's create statement (with this exception).      $trackback->excerpt = (strlen($_REQUEST['excerpt'] > 255) ? truncate_utf8($_REQUEST['excerpt'], 252) .'...' : $_REQUEST['excerpt']);
+      $trackback->status = (variable_get('trackback_moderation', 0) == 0) ? 1 : 0;
+      //watchdog('trackback', t('trackback: added \'%subject\'', array('%subject' => $trackback->subject)), l(t('view trackback'), 'node/'. $node->nid, NULL, NULL, 'trackback-'. $trackback->trid));
+
+      watchdog('trackback', t('trackback: added \'%subject\'', array('%subject' => $trackback->subject)), WATCHDOG_NOTICE, l(t('view trackback'), 'node/'. $node->nid .'#trackback-'. $trackback->trid));
+
+      db_query("INSERT INTO {trackback_received} (trid, nid, created, site, name, subject, url, excerpt, status) VALUES (%d, %d, %d, '%s', '%s', '%s', '%s', '%s', %d)", $trackback->trid, $trackback->nid, $trackback->created, $trackback->site, $trackback->name, $trackback->subject, $trackback->url, $trackback->excerpt, $trackback->status);
+
+      if (function_exists('spam_check') && variable_get('trackback_spam_filter', 1)) {
+        // trid, subject and body are used by spam_check()
+        // Put everything we want to check for spam in the trackback body and subject
+        // Since everything has already been inserted into the database and since the value
+        // of the $trackback variable's attributes will not be used again in this function,
+        // we can modify the attributes as we want before passing it on to the spam check.
+        $trackback->subject = $trackback->subject .' '. $trackback->url;
+        spam_check($trackback, 'excerpt', 'subject', 'trackback_spam_actions', 'insert');
+      }
+    }
+  }
+  else {
+    $error = 1;
+    $message = t('Missing TrackBack url.');
+  }
+
+  // Generate response
+  $output = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
+  $output .= "<response>\n";
+  $output .= '<error>'. $error ."</error>\n";
+  $message and $output .= '<message>'. $message ."</message>\n";
+  $output .= "</response>\n";
+
+  return $output;
+}
+
+
+
+
 
+/*OLD
 function trackback_receive(&$node) {
   $trackback = new stdClass();
   // Process TrackBack post data.
@@ -71,6 +156,7 @@ function trackback_receive(&$node) {
 
   return $output;
 }
+END OLD */
 
 function theme_trackbacks($trackbacks) {
   $output = '<div id="trackbacks">'."\n";
reikiman’s picture

StatusFileSize
new5.09 KB

Well, shoot, drupal sure barfed on that patch. So let's attach rather than include it inline.

beginner’s picture

Version: 4.7.x-1.x-dev » master

Has the function changed that much?
Can you remove everything between /*OLD and END OLD */?

Also, can you provide a patch for cvs? No feature is ever added on a stable release.

beginner’s picture

and maybe a patch against the root of Drupal's cvs repository, instead of yours, would be appreciated.

reikiman’s picture

This is beginning to get on my nerves with all these requirements being tossed at me one at a time. First it's to use this strange -u diff format (-c format is immensely better) and now you want me to set up a new version of drupal??? Okay... breath .. breath ...

I do see the trackback.module at the CVS head is rather different, especially the trackback_receive function.

Yes, everything between the OLD and END OLD can be removed. Can't you just do that as you edit the module? er... breath.. breath...

Yes, the change is really very minor. It's to do a drupal_http_request on the requestors URL, check for result status, and then conditionally execute the existing code if the status indicates appropriate conditions. The initial report on this RFE describes the testing process and change.

But at the same time I do not have drupal set up for the CVS head ... I only have drupal set up for 4.7.

Here is an attempt to edit my patch into the function I see in the CVS head. I am not willing to set up a drupal instance of the CVS Head just to test this out. This is the third version of this patch I've provided where each time I'm meeting a brick wall.


function trackback_receive(&$node) {
  $trackback = new stdClass();
  // Process TrackBack post data.
  $trackback->url = check_url($_REQUEST['url']);
  if ($trackback->url && valid_url($_REQUEST['url'], TRUE)) {

    $error = 0;

     $response = drupal_http_request($trackback->url);
     // Check $response->error to see if the request happened right
     // Look in $response->data for our URL
     if ($response->error) {
       $error = 1;
       $message = t('Could not retrieve requesting page');
       watchdog('trackback', t('Could not retrieve requesting page %url', array('%url' => $trackback->url)),            WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
     } else {
       $node_url = url('node/'. $node->nid, NULL, NULL, TRUE);
       $found = stristr($response->data, $node_url);
       if ($found === false) {
         $error = 1;
         $message = t('Page requesting trackback does not refer to '.$node_url);
         watchdog('trackback',
           t('Page requesting trackback does not refer to %nodeurl, source is %url',
             array('%url' => $trackback->url, '%nodeurl' => $node_url
           )),
           WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid));
       } else {
         watchdog('trackback',
           t('Page requesting trackback DOES refer to %nodeurl, source is %url -- WILL RECORD TRACKBACK',
             array('%url' => $trackback->url, '%nodeurl' => $node_url
           )),
           WATCHDOG_NOTICE, l(t('view node'), 'node/'. $node->nid)
         );
       }
     }

   if ($error == 0) {
    $trackback->trid = db_next_id('{trackback_received}_trid');
    $trackback->nid = $node->nid;
    $trackback->created = time();
    $trackback->site = $_SERVER['REMOTE_ADDR'];
    $trackback->name = strip_tags(($_REQUEST['blog_name']) ? $_REQUEST['blog_name'] : $trackback->url);
    $trackback->subject = strip_tags(($_REQUEST['title']) ? $_REQUEST['title'] : $trackback->url);
    // $trackback->url already set above.  Though I might say something here since I'm setting the fields
    // in the exact same order that they are created in the table's create statement (with this exception).
    $trackback->excerpt = (strlen($_REQUEST['excerpt'] > 255) ? truncate_utf8($_REQUEST['excerpt'], 252) .'...' : $_REQUEST['excerpt']);
    $trackback->status = (variable_get('trackback_moderation', 0) == 0) ? 1 : 0;

    // drop silently if this is from a known spammer IP address
    if (function_exists('spam_ip_filter') && variable_get('trackback_spam_filter', 1)) {
      module_invoke('spam', 'ip_filter', 'trackback', $trackback->trid);
    }

    watchdog('trackback', t('trackback: added \'%subject\'', array('%subject' => $trackback->subject)), WATCHDOG_NOTICE, l(t('view trackback'), 'node/'. $node->nid .'#trackback-'. $trackback->trid));

    db_query("INSERT INTO {trackback_received} (trid, nid, created, site, name, subject, url, excerpt, status) VALUES (%d, %d, %d, '%s', '%s', '%s', '%s', '%s', %d)", $trackback->trid, $trackback->nid, $trackback->created, $trackback->site, $trackback->name, $trackback->subject, $trackback->url, $trackback->excerpt, $trackback->status);


    if (function_exists('spam_content_filter') && variable_get('trackback_spam_filter', 1)) {
      // invoke spam.module's spam filter
      $subject = "$trackback->subject $trackback->url";
      module_invoke('spam', 'content_filter', 'trackback', $trackback->trid, $subject, $trackback->excerpt);
    }

    $error = 0;
    }

  }
  else {
    $error = 1;
    $message = t('Missing TrackBack url.');
  }

  // Generate response
  $output = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n";
  $output .= "<response>\n";
  $output .= '<error>'. $error ."</error>\n";
  $message and $output .= '<message>'. $message ."</message>\n";
  $output .= "</response>\n";

  return $output;
}



reikiman’s picture

FWIW, I just did a diff between the CVS HEAD version of trackback.module and the one I have. There are no changes other than the function I am proposing here.

Also I noticed what ought to be a bug in the function I'm proposing. I noticed that I forgot to copy the first line of the original function into my function:

  $trackback = new stdClass();

This should, of course, be the first line of the function.

You should be able to take the trackback.module that I've already attached to this RFE, and use it directly with these two changes: 1) add the above line to the beginning of trackback_receive, b) as you noted, remove the part between OLD and END OLD.

To verify what I'm saying you should be able to diff yourself and see that the only change is to replace the trackback_receive function.

dkruglyak’s picture

Title: Refuse trackback when trackbacking page does not link to the receiving page » SPAM Fighting: Refuse trackback when trackbacking page does not link to the receiving page
Version: master » 4.7.x-1.x-dev

+1. The idea is quite obvious and I am surprised the patch did not get in as a module option... Being bombarded with spam trackbacks that do not link back is a huge problem.

This is needed in both 4.7.x and 5.x

dkruglyak’s picture

Status: Needs work » Reviewed & tested by the community

I made a fix that is more compact and up-to-date than the one posted above.

First this new function (using curl library) fetches a webpage and checks whether it contains base_url

/**
 * Sanity-checks trackback by making sure the URL contains link back
 */
function trackback_sanity_check_link($url, $timeout = 30) {
  global $base_url;
  
  // Fetch trackback link page (using timeout)
  $c = curl_init();
  curl_setopt($c,CURLOPT_URL,$url);
  curl_setopt($c,CURLOPT_HEADER,0);
  curl_setopt($c,CURLOPT_FOLLOWLOCATION,1);
  curl_setopt($c,CURLOPT_RETURNTRANSFER,1);
  curl_setopt($c,CURLOPT_TIMEOUT, $timeout);
  $data=curl_exec($c);
  curl_close($c);

  // If page contains base_url return TRUE
  return stripos($data,$base_url)!==FALSE;   
}

Then trackback_receive is updated to include another check to silently drop trackbacks that do not link back

    if (!trackback_sanity_check_link($trackback->url)) {
      if (!module_invoke('throttle', 'status')) {
        sleep(variable_get('spam_ip_filter_sleep', 30));
      }
      return;
    }

This worked wonders on my site that came under siege of spammers flooding me with thousands of trackbacks a day. Ready to commit. The only possible enhancement is to make the extra check optional, based on user setting.

dordal’s picture

I just tried this patch and can confirm that it works as advertised. Thanks, dkruglyak!

reikiman’s picture

This looks like a good improvement. In my case I've turned off trackback on my sites, even with the patch I submitted at the top of this page, because it sometimes seemed to make my site to into a long wait for some pages. Having a timeout is important for that reason.

One tweak I'd like to see is (optional) logging of trackback failures. This way we could use those failures to turn on filters at a higher level such as the troll module or in server firewall rules. e.g. if some server is repeatedly making spammy trackback attempts I'd rather just block them from getting to the site in the first place. As it is, because they're free to make as many trackback requests as they want, they are free to tie up my server resources.

gnassar’s picture

Status: Reviewed & tested by the community » Active

Going with Drupal's http request code probably beats adding a dependency to curl in the long run. And of course, there's still no patch file uploaded. Demoting from RTBC.

As an alternative, Stefan, the creator of the Multiping module, has a pretty nicely rolled-up separate module to handle this check, along with some other "sanity checks:"

http://stefan.ploing.de/linux/drupal

Sadly, it still requires a small patch to the trackback module to work. But happily, the .patch file is included in the roll-up.

Long-term, perhaps putting a place for other modules to hook into within trackback.module would probably be the best solution for this; then others can supply custom spam-blocking modules to their heart's delight.

zorac’s picture

Status: Active » Fixed

New release 1.3 includes this feature as "Reject one-way trackbacks" option.

Anonymous’s picture

Status: Fixed » Closed (fixed)