There appeared to be a hiccup in the Mollom service this morning and when that happened it caused all the sites on our multisite environment Drupal 5 & Drupal 6 (over 100 sites) to stop communicating with Mollom. The problem is that it required us to visit every single sites admin page for Mollom to reestablish communications. Why can't this be checked on cron so if an issue like this happens it will be resolved within a reasonable amount of time?
Robbie
| Comment | File | Size | Author |
|---|---|---|---|
| #15 | mollom-server-list-recovery-tests.patch | 2.3 KB | dries |
| #15 | server-list-recovery.jpg | 302.62 KB | dries |
| #14 | mollom-server-list-recovery-tests.patch | 2.43 KB | dries |
| #10 | 574940-10.mollom_error_handling.patch | 858 bytes | jbrauer |
Comments
Comment #1
dugh commentedsame with us, about 30 spam comments were posted on our site this morning because of this.
I would like to have the option of falling back to only allowing registered users to comment if mollom is down.
I can't take things completely down because we have people who depend on the site and need to keep using even if mollom is down.
Comment #2
dries commentedWhenever the server list is empty, we try to retrieve a new list of servers at the top of the
mollom()function. See:Also, whenever all servers are unreachable, we do
variable_del('mollom_servers');inmollom(). That is, we remove the list of servers in case everything fails. That pattern exists in both the Drupal 5 and the Drupal 6 module.In other words, it should not take manual work to retrieve a new server list -- unless, of course, somehow the code in mollom() is buggy and we didn't reach
variable_del('mollom_servers');. I looked at it pretty closely and I can't spot anything broken.Do you still have access to the log entries that were generating during the down period? These would be helpful to debug the problem.
Comment #3
robbiethegeek commentedHey Dries,
If this happens again will grab some logs to help with debugging.
Robbie
Comment #4
dries commentedI tracked down a bug in the Drupal 5 module of Mollom -- the server list wasn't properly reset when a problem occurs. The latest version of the Drupal 6 module of Mollom does not have this problem.
I've committed a fix to the HEAD of the DRUPAL-5 branch and will create a new release later today.
As an interim fix, you can force your server list to be reset by visiting the Mollom settings page at ?q=admin/settings/mollom.
Comment #5
mygumbo commentedI had the same problem with v6. Loss of communications, and log errors looked like this:
All Mollom servers were unavailable: Array ( [0] => 174.37.205.152 [1] => 88.151.243.81 [2] => 88.151.243.145 ) , last error: -
Re-visiting the settings page kickstarted it again.
Thanks,
AD
Comment #6
jbrauer commentedI've also seen this on several sites, starting in most cases about 9/11 and continuing until the settings page is re-submitted.
All Mollom servers were unavailable: Array ( [0] => 174.37.205.152 [1] => 88.151.243.81 [2] => 88.151.243.145 ) , last error: -
Comment #7
japerrydiddo. Happened on both D5 and d6, the failure occurred immediately after drupal refreshed the servers on 9/11, however once I visited the page today and saved (as talked about above), it started capturing again.
Dump of the Mollom Logs from a production Drupal 5 site:
mollom 2009-09-14 20:21 Spam: I am sorry, that I interfere, but you could ... Visitor
mollom 2009-09-14 20:21 Spam: I am sorry, that I interfere, but you could ... Visitor
mollom 2009-09-14 20:20 The list of available Mollom servers was refreshed: Array ( [0] => http://174.37.205.152 [1] => http://88.151.243.81 [2] => http://88.151.243.145 ) .... admin
error mollom 2009-09-14 18:45 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-12 22:01 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-12 21:12 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-12 21:12 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-11 12:27 Mollom was unavailable (error: - ) Visitor
error mollom 2009-09-11 12:27 Mollom was unavailable (error: - ) Visitor
mollom 2009-09-11 12:27 The list of available Mollom servers was refreshed: Array ( [0] => 174.37.205.152 [1] => 88.151.243.81 [2] => 88.151.243.145 ) . ... Visitor
Comment #8
jbrauer commentedSo here's what I think is going on...
So what I've been able to discern (thanks @japerry for the logs) is that on 9/11 Mollom servers sent out the list of servers as naked IP addresses. (This is a guess but would explain why so many see the same error).
The problem boils down to if you pass a naked IP, instead of a properly formed URL to xmlrpc() the xmlrpc_error() function doesn't get a return value.
in _xmlrpc() the following code is called in this instance:
However $result->code is undefined in this case.
As a consequence xmlprc_error() return is also undefined
which finally leads to the mollom() function
So we end up here in the else case which prints the watchdog message but does not cause the list of servers to be rebuilt.
In the meantime going to the settings page and resubmitting is the quickest fix.
Comment #9
dave reidThat would appear to be the Mollom server misconfiguration issue that we've been talking about. Now that the server list is properly configured with the proper Mollom server IP URLs, things are working as expected. There's not much the D6 version of the module can do.
Comment #10
jbrauer commentedAttached patch handles the case where xmlrpc_error() isn't set because the connection could not be made (in the recent case because of a bad server list). In this case it treats the situation the same as getting an explicit MOLLOM_REFRESH code. #352642: Fix mollom() error handling also has some discussion of this situation.
Successfully tested against the case of the bad server addresses.
Comment #11
dries commentedThe problem was introduced by a manual error on the server side. We set a list of Mollom servers without specifying the http://-schema. This uncovered a bug in the Drupal 5 module which has now been fixed. The latest version of the Drupal 6 module was not affected. The solution is to upgrade to the latest version of the Drupal 5 or Drupal 6 module for Mollom. Specifically, mollom-5.x-1.8 or mollom-6.x-1.9 or higher.
We provided some additional details at http://mollom.com/blog/important-notice-for-drupal-5-mollom-users.
While jbrauer's analysis was correct, I believe this patch is unnecessary because these versions of the module will automatically retrieve a new server list on failure -- they delete the server list after all servers failed (see bottom of
mollom()), which will result in the module requesting a new server list in the next request (see top ofmollom()).If you think the patch is necessary, and chance you could create a test case that illustrates the need?
Comment #12
jbrauer commentedJust sent a more detailed email. Basically in the case of this failure (bad servers in the variable table) the logic prevents the code for rebuilding the server list from ever being called.
Comment #13
jbrauer commentedpicked wrong version earlier...
Comment #14
dries commentedI did some additional investigation and I think you're right.
I created a new test case that isolates the problem and that illustrates that the lack of a 'http://'-schema in the server list requires special error handling.
I've attached my new tests for review.
Comment #15
dries commentedSlightly better version of my test case. You should see three passes and two fails -- see screenshot. The three passes suggest that the server list recovery mechanism works when "http://" was used. The two fails suggest that the server list recovery mechanism fails when no "http://" was specified.
Comment #16
dries commentedI committed the patch in #15 to DRUPAL-6--1 HEAD of the Mollom module. Next up: working on a fix for this problem. The patch in #10 looks like a good start, but I'm wondering if this needs to be fixed in core.
Comment #17
dries commentedThe problem with #10 is that valid XML-RPC calls to mollom.com can return FALSE. For example,
mollom.verifyCaptchareturns FALSE when the CAPTCHA result was invalid. Looking at your patch, it looks as if a new server list would be retrieved every time Mollom returns FALSE. That is unwanted behavior because our servers would be hammered with unnecessary calls tomollom.getServerListand would cause unwanted delays for users. Or am I wrong?Personally, I think this would be best fixed in xmlrpc_error() ... other modules have this problem too, and it really looks like an XML-RPC API snafu to me. xmlrpc_error() should be reliable.
Comment #18
jbrauer commented@dries you're right on in #17.... The patch won't cut it and this really is an issue with xmlrpc_error() not returning a meaningful result if it gets no error code passed to it. In the lack of schema case drupal_http_request() fails before it generates an error code and returns only a message.
Setting to needs work as the test looks good but the patch for the issue isn't usable.
Comment #19
dries commentedI created a bug report for core at #578470: XML-RPC error handling sometimes fails silently.
Comment #20
dries commented#578470: XML-RPC error handling sometimes fails silently has been fixed. All we need to do next is create new releases of Drupal 5 core and Drupal 6 core. I'm marking this 'fixed' because it will be fixed as soon these releases are made. Not much left to do in this issue.