Occasionally, users can't get to the Review order page. The problem is when the shipping quote fails going from the Checkout page to the Review page it causes a mostly white page with the message:

Skip to Main Content Area
The website encountered an unexpected error. Please try again later.

Then if you refresh or go back and try again it usually moves on through to the Review page. I found that every time this happens there's a php error that shows up in the log as:

Exception: String could not be parsed as XML in SimpleXMLElement->__construct() (line 394 of /....../ubercart/shipping/uc_usps/uc_usps.module)

I wrapped the line it was referring to in some debug statements and found in those cases then the $result object being returned from USPS did not have a 'data' property because it was returning an error instead:

stdClass Object ( [code] => 0 [error] => php_network_getaddresses: getaddrinfo failed: Name or service not known )

I tried to figure out why that response would be given by checking dns records for the local server vs another public DNS server (see here) and found that wasn't the problem. The only thing I could guess is that something on the USPS service side is failing intermittently.

I did two things to handle the problem. First, I implemented a longer default_socket_timeout as recommended here this seemed to significantly reduce the errors. Second, I added a check before trying to create the SimpleXMLElement to make sure it has data. If not, then it tries again. I set it to try 10 times if needed. Each additional try logs the attempt with the current response and if it fails after the 10 tries then it gives a real error in the log too. See attached patch.

With that fix in place, so far, I have only seen it try as many as 2 times before moving on successfully but the user never has to deal with the problem. Before the fix, I also noticed that sometimes the quote would fail on the checkout page in the first place but wasn't as big a deal because the user can click the ajax button and it tries again.

CommentFileSizeAuthor
usps_ship_quote_fail.patch1.75 KBtrrroy
Support from Acquia helps fund testing for Drupal Acquia logo

Comments

longwave’s picture

Version: 6.x-2.12 » 6.x-2.x-dev
Status: Active » Needs review
longwave’s picture

Closed #1716956: UPS: check response data before using it as duplicate as at least this has a patch, we should probably deal with UPS and USPS at the same time.

TR’s picture

The getaddrinfo failed error indicates a problem on your server, not on the USPS side. Perhaps you're running out of sockets and need to change your server configuration. Or perhaps your load is too high, or your bandwidth too low, firewall too slow, etc., for the tasks you're performing.

But it's quite possible this error arises due to the internals of drupal_http_request(), which is what we are using to make the quote request to the USPS server. drupal_http_request() has always been prone to these sorts of errors, and it's a constant frustration how this important core Drupal function has been neglected and ignored over the years (just search the issue queue - it has fundamental problems which have been reported for 8 years or more that are still languishing. Heck, it sends requests using HTTP/1.0, even though browsers and servers have supported HTTP/1.1 since 1996. Yes, that is not a typo, 1996.Some servers even refuse to accept HTTP/1.0 anymore ...)

Regardless, yes I think we should check the $response object to see if it's valid before using it, but I think we should abort if it's bad. This is the way shipping quotes usually work; if they can't get a quote for some reason, they return nothing. A watchdog message would be appropriate if the reason was network related, so that the admin receives this information. A customer-facing error message is of course not what we want.

I think increasing the socket timeout to 20 minutes (!!) is not the right way to go at all. If you don't get a response in the default 60 seconds, the customer has probably already moved on anyway. And if your problem is due to resource starvation, holding more sockets open for a longer period of time is going to make it worse, not help. Additionally, the brute-force strategy of repeatedly trying until you get a result is like repeatedly hitting refresh in your browser if the page isn't loading - it's only making the server load worse and not addressing the root cause of the problem. And it won't work at all if the problem is that the USPS server is down, or there is some connectivity problem elsewhere between your server and the USPS server. For that I always recommend a fallback flat-rate method (which is necessary for other reasons as well, for instance if an order is too heavy for USPS).

If anything is to be done here, other than handling the error message properly, I would advocate for using cURL instead of drupal_http_request(). I've tried to be a good citizen over the years by endorsing the use of drupal_http_request() everywhere, but it's such a dysfunctional piece of code and causes so many problems that I really can't recommend it anymore. I've been using cURL or even straight sockets in my own code for a while now because of all the problems with drupal_http_request(). That's solved some old networking problems for me, and I haven't seen any new ones arise, so I'm comfortable with that decision.

TR’s picture

Version: 6.x-2.x-dev » 8.x-4.x-dev
Category: Bug report » Feature request