We're getting a fair number of RPX error messages on our site that look like this:

Type	page not found
Date	Monday, May 3, 2010 - 17:15
User	Anonymous
Location	http://www.nysenate.gov/rpx/end_point?destination=senator%2Fjoseph-griffo%2Fissues%2F0.8
Referrer	http://www.nysenate.gov/senator/joseph-griffo/issues/0.8
Message	rpx/end_point
Severity	warning
Hostname	132.174.2.184

We're getting 5-10 of these errors per minute.

Currently we're running version 6.x-1.x-dev of the RPX module, which was updated on 2010-Mar-29.

Comments

nrambeck’s picture

Assigned: Unassigned » nrambeck

This is very strange indeed. I've never seen errors like this on any sites running rpx.

Since the referrer page doesn't include any actual login links, it must be either bots or some automatically executed javascript that is causing these log errors (see the RPXNOW.token_url variable in the footer of the source of each page). Another strange things is that if these pages are triggered apart from the RPX login process, they should return Access Denied pages and not Page Not Found.

The way the module is setup currently, every page includes the RPX javascript in the footer, even though it is not required on pages that do not have RPX login links. While, I don't know what the root of your specific error log problem is, it would be greatly diminished, if the RPX javascript was only placed on pages in which an RPX login link was available.

Selectively adding the RPX javascript to the page is simple to do, but can cause issues if the RPX login widget is cached (like with block-level caching), but the whole page is not. Since the login links should only be shown to users who are not logged in, this would be a rare setup.

I'll experiment with ways to remove the RPX javascript when it is not needed, since this will be a front-end performance boost as well.

Sheldon Rampton’s picture

We don't use the login block on our site. Users click on the login link and go to the login page if they want to log in. Based on your comments above, therefore, I've hacked the RPX module for now by adding the following code to the top of the rpx_footer() function:

  if (arg(0) != 'user') {
    return;
  }

This seems to have eliminated most of the error messages, although we're still getting some from page user/register.

Sheldon Rampton’s picture

The problem is coming from the following bit of Javascript that the RPX module was placing at the bottom of each page on our site:

    <script type="text/javascript" src="https://rpxnow.com/openid/v2/widget"></script>
    <script type="text/javascript">
      <!-- Begin RPX Sign In from JanRain. Visit http://www.rpxnow.com/ -->
      RPXNOW.token_url = "http://www.nysenate.gov/rpx/end_point?destination=user%2F1088"
      RPXNOW.realm = "login.nysenate.gov";
      RPXNOW.overlay = true;
      RPXNOW.language_preference = "en";
      RPXNOW.flags = "delay_domain_check";
      RPXNOW.ssl = true;
      RPXNOW.init({appId: "chihdgldhpfifkjfkflp",xdReceiver: '/sites/all/modules/contrib/rpx/rpx_xdcomm.html'});
      <!-- End RPX Sign In -->
  </script>

The "destination=user%2F1088" portion is supposed to redirect people back to the page they were on after login. Since we don't have a login block on every page, I hacked the RPX module so it only puts the Javascript block on user login pages, which vastly reduces the number of error messages we're getting, although it doesn't eliminate them entirely.

One thing I don't understand is why this bit of Javascript is generating "page not found" errors rather than "access denied." If I simply enter http://www.nysenate.gov/rpx/end_point?destination=user%2F1088 in my web browser, I get "access denied."

nrambeck’s picture

Title: RPX-related error messages » RPX javascript triggering page not found and access denied pages

I'm making some code changes to remove RPX javascript from pages that don't need it.

Changing the title to be more descriptive.

nrambeck’s picture

Status: Active » Needs work

I've committed changes that remove RPX javascript from pages that don't need them. That will reduce these log errors and improve front-end performance as well. I've asked developers at JanRain to look at this Access Denied (and Page Not Found) type of error.

dmuth’s picture

Version: 6.x-1.x-dev » 6.x-1.3

I'm afraid that I am seeing this as well on sites that I run. This is with 6.x-1.3 (2010-Jun-16)

I'm seeing this error when Google tries to access pages like these:

http://www.saveardmorecoalition.org/rpx/end_point?destination=node%3Fpag...

I don't want to block Google, since then our site wouldn't get indexed. So I wrote some PHP code to fix the problem:

$bad = "66.249.71.186";
if ($ip == $bad) {
   if ($_REQUEST["q"] == "rpx/end_point") {
      header("HTTP/1.1 403 Forbidden");
      print "Hey Google, please honor the 403 error I'm giving you!";
      exit();
   }
}

This code can go right in index.php before the bootstrap is loaded, to help performance.

Consider this code licensed under whatever license this module is under. Feel free to use it in the module accordingly.

Phil Wolstenholme’s picture

Wouldn't the right thing to do be to add rel="nofollow" to all the RPX links (http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=96569)?

Would work for all decent search engine bots, not just Google, and do the same as a 403 (I think!)

irandream’s picture

yes ,it exactly work with yahoo..altavisat and other search engines.thank u

طراحی وب سایت
mlncn’s picture

[edit, tried to mark a forum post as duplicate, didn't work] This issue was also reported at http://drupal.org/node/1158448