There appeared to be a hiccup in the Mollom service this morning and when that happened it caused all the sites on our multisite environment Drupal 5 & Drupal 6 (over 100 sites) to stop communicating with Mollom. The problem is that it required us to visit every single sites admin page for Mollom to reestablish communications. Why can't this be checked on cron so if an issue like this happens it will be resolved within a reasonable amount of time?

Robbie

Comments

dugh’s picture

same with us, about 30 spam comments were posted on our site this morning because of this.

I would like to have the option of falling back to only allowing registered users to comment if mollom is down.
I can't take things completely down because we have people who depend on the site and need to keep using even if mollom is down.

dries’s picture

Whenever the server list is empty, we try to retrieve a new list of servers at the top of the mollom() function. See:

  if ($servers == NULL) {
    // Retrieve a new list of servers:
    $servers = _mollom_retrieve_server_list();

    // Store the list of servers in the database:
    variable_set('mollom_servers', $servers);
  }

Also, whenever all servers are unreachable, we do variable_del('mollom_servers'); in mollom(). That is, we remove the list of servers in case everything fails. That pattern exists in both the Drupal 5 and the Drupal 6 module.

In other words, it should not take manual work to retrieve a new server list -- unless, of course, somehow the code in mollom() is buggy and we didn't reach variable_del('mollom_servers');. I looked at it pretty closely and I can't spot anything broken.

Do you still have access to the log entries that were generating during the down period? These would be helpful to debug the problem.

robbiethegeek’s picture

Hey Dries,

If this happens again will grab some logs to help with debugging.

Robbie

dries’s picture

I tracked down a bug in the Drupal 5 module of Mollom -- the server list wasn't properly reset when a problem occurs. The latest version of the Drupal 6 module of Mollom does not have this problem.

I've committed a fix to the HEAD of the DRUPAL-5 branch and will create a new release later today.

As an interim fix, you can force your server list to be reset by visiting the Mollom settings page at ?q=admin/settings/mollom.

mygumbo’s picture

I had the same problem with v6. Loss of communications, and log errors looked like this:

All Mollom servers were unavailable: Array ( [0] => 174.37.205.152 [1] => 88.151.243.81 [2] => 88.151.243.145 ) , last error: -

Re-visiting the settings page kickstarted it again.
Thanks,
AD

jbrauer’s picture

Version: 6.x-1.7 » 6.x-1.9

I've also seen this on several sites, starting in most cases about 9/11 and continuing until the settings page is re-submitted.

All Mollom servers were unavailable: Array ( [0] => 174.37.205.152 [1] => 88.151.243.81 [2] => 88.151.243.145 ) , last error: -

japerry’s picture

diddo. Happened on both D5 and d6, the failure occurred immediately after drupal refreshed the servers on 9/11, however once I visited the page today and saved (as talked about above), it started capturing again.

Dump of the Mollom Logs from a production Drupal 5 site:

mollom 2009-09-14 20:21 Spam: I am sorry, that I interfere, but you could ... Visitor
mollom 2009-09-14 20:21 Spam: I am sorry, that I interfere, but you could ... Visitor
mollom 2009-09-14 20:20 The list of available Mollom servers was refreshed: Array ( [0] => http://174.37.205.152 [1] => http://88.151.243.81 [2] => http://88.151.243.145 ) .... admin
error mollom 2009-09-14 18:45 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-12 22:01 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-12 21:12 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-12 21:12 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-11 12:27 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-11 12:27 Mollom was unavailable (error: - ) Visitor
mollom 2009-09-11 12:27 The list of available Mollom servers was refreshed: Array ( [0] => 174.37.205.152 [1] => 88.151.243.81 [2] => 88.151.243.145 ) . ... Visitor

jbrauer’s picture

So here's what I think is going on...

So what I've been able to discern (thanks @japerry for the logs) is that on 9/11 Mollom servers sent out the list of servers as naked IP addresses. (This is a guess but would explain why so many see the same error).

The problem boils down to if you pass a naked IP, instead of a properly formed URL to xmlrpc() the xmlrpc_error() function doesn't get a return value.

in _xmlrpc() the following code is called in this instance:

  if ($result->code != 200) {
    xmlrpc_error($result->code, $result->error);
    return FALSE;
  }

However $result->code is undefined in this case.

As a consequence xmlprc_error() return is also undefined

function xmlrpc_error($code = NULL, $message = NULL, $reset = FALSE) {
  static $xmlrpc_error;
  if (isset($code)) {
    $xmlrpc_error = new stdClass();
    $xmlrpc_error->is_error = TRUE;
    $xmlrpc_error->code = $code;
    $xmlrpc_error->message = $message;
  }
  elseif ($reset) {
    $xmlrpc_error = NULL;
  }
  return $xmlrpc_error;
}

which finally leads to the mollom() function

      if ($result === FALSE && ($error = xmlrpc_error())) {

So we end up here in the else case which prints the watchdog message but does not cause the list of servers to be rebuilt.

In the meantime going to the settings page and resubmitting is the quickest fix.

dave reid’s picture

That would appear to be the Mollom server misconfiguration issue that we've been talking about. Now that the server list is properly configured with the proper Mollom server IP URLs, things are working as expected. There's not much the D6 version of the module can do.

jbrauer’s picture

Status: Active » Needs review
StatusFileSize
new858 bytes

Attached patch handles the case where xmlrpc_error() isn't set because the connection could not be made (in the recent case because of a bad server list). In this case it treats the situation the same as getting an explicit MOLLOM_REFRESH code. #352642: Fix mollom() error handling also has some discussion of this situation.

Successfully tested against the case of the bad server addresses.

dries’s picture

Status: Needs review » Closed (won't fix)

The problem was introduced by a manual error on the server side. We set a list of Mollom servers without specifying the http://-schema. This uncovered a bug in the Drupal 5 module which has now been fixed. The latest version of the Drupal 6 module was not affected. The solution is to upgrade to the latest version of the Drupal 5 or Drupal 6 module for Mollom. Specifically, mollom-5.x-1.8 or mollom-6.x-1.9 or higher.

We provided some additional details at http://mollom.com/blog/important-notice-for-drupal-5-mollom-users.

While jbrauer's analysis was correct, I believe this patch is unnecessary because these versions of the module will automatically retrieve a new server list on failure -- they delete the server list after all servers failed (see bottom of mollom()), which will result in the module requesting a new server list in the next request (see top of mollom()).

If you think the patch is necessary, and chance you could create a test case that illustrates the need?

jbrauer’s picture

Just sent a more detailed email. Basically in the case of this failure (bad servers in the variable table) the logic prevents the code for rebuilding the server list from ever being called.

jbrauer’s picture

Version: 6.x-1.9 » 6.x-1.10

picked wrong version earlier...

dries’s picture

Status: Closed (won't fix) » Needs review
StatusFileSize
new2.43 KB

I did some additional investigation and I think you're right.

I created a new test case that isolates the problem and that illustrates that the lack of a 'http://'-schema in the server list requires special error handling.

I've attached my new tests for review.

dries’s picture

Slightly better version of my test case. You should see three passes and two fails -- see screenshot. The three passes suggest that the server list recovery mechanism works when "http://" was used. The two fails suggest that the server list recovery mechanism fails when no "http://" was specified.

dries’s picture

I committed the patch in #15 to DRUPAL-6--1 HEAD of the Mollom module. Next up: working on a fix for this problem. The patch in #10 looks like a good start, but I'm wondering if this needs to be fixed in core.

dries’s picture

The problem with #10 is that valid XML-RPC calls to mollom.com can return FALSE. For example, mollom.verifyCaptcha returns FALSE when the CAPTCHA result was invalid. Looking at your patch, it looks as if a new server list would be retrieved every time Mollom returns FALSE. That is unwanted behavior because our servers would be hammered with unnecessary calls to mollom.getServerList and would cause unwanted delays for users. Or am I wrong?

Personally, I think this would be best fixed in xmlrpc_error() ... other modules have this problem too, and it really looks like an XML-RPC API snafu to me. xmlrpc_error() should be reliable.

jbrauer’s picture

Status: Needs review » Needs work

@dries you're right on in #17.... The patch won't cut it and this really is an issue with xmlrpc_error() not returning a meaningful result if it gets no error code passed to it. In the lack of schema case drupal_http_request() fails before it generates an error code and returns only a message.

Setting to needs work as the test looks good but the patch for the issue isn't usable.

dries’s picture

dries’s picture

Status: Needs work » Fixed

#578470: XML-RPC error handling sometimes fails silently has been fixed. All we need to do next is create new releases of Drupal 5 core and Drupal 6 core. I'm marking this 'fixed' because it will be fixed as soon these releases are made. Not much left to do in this issue.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.