drupal_http_request() does not handle 'chunked' responses

njivy - January 3, 2007 - 03:54
Project:Drupal
Version:6.x-dev
Component:base system
Category:bug report
Priority:normal
Assigned:Unassigned
Status:patch (code needs work)
Description

I encountered this problem when using aggregator.module to load web feeds. Sometimes the process would return with an message about a "syntax error on line 1." As it turns out, chunked HTTP/1.1 responses have extra content-length encoding in the message body.

drupal_http_request() passes the extra encodings along as part of the message body. This confuses the XML parser.

#1

njivy - January 3, 2007 - 04:06

The drupal_http_request() behavior hasn't changed since Drupal 4.6, or maybe longer.

If I understand the protocol correctly, we can look for the Transfer-Encoding header and then strip away the extra lines. The problem is, sometimes the content-length encodings appear in the middle of the message body, not just the beginning and end.

#2

njivy - January 3, 2007 - 04:11

#3

ChrisKennedy - January 3, 2007 - 07:28

Many thanks for the bug report. Drupal actually just switched from HTTP/1.0 to HTTP/1.1 for drupal_http_request() (see http://drupal.org/node/104693), although I think this change was after rc1. This might be helpful in figuring out why gmap times out.

Do you still have this problem with using the latest CVS version of Drupal?

#4

njivy - January 3, 2007 - 13:01

Oops, I was wrong about drupal_http_request() not changing. I didn't spot the HTTP/1.0 to HTTP/1.1 difference.

The problem disappears when I revert to HTTP/1.0. Thanks for the tip.

#5

njivy - January 3, 2007 - 13:39
Version:5.0-rc1» 5.x-dev
Status:active» patch (code needs review)

This patch attempts to handle "chunked" transfer encodings. My tests indicate it works.

AttachmentSize
chunked.patch1.25 KB

#6

njivy - January 3, 2007 - 13:40

Hrm. This patch uses standard filenames and paths. Sorry 'bout that.

AttachmentSize
chunked_0.patch1.26 KB

#7

ChrisKennedy - January 3, 2007 - 13:51
Status:patch (code needs review)» patch (code needs work)

It needs minor work to conform to drupal coding standards (see http://drupal.org/node/318) - operator spacing and comment styles both need to be tweaked.

It might be best to hold off until 6.x to add handling for chunked responses.

#8

njivy - January 3, 2007 - 17:01
Status:patch (code needs work)» patch (code needs review)

I had ported some code from php.net, hence the coding style. This patch cleans it up.

AttachmentSize
chunked_1.patch1.06 KB

#9

Dries - January 4, 2007 - 09:48
Version:5.x-dev» 6.x-dev

Chris: any chance you can test this with your gmap setup?

I think we need to postpone this to Drupal 6.

What's the advantage of the chunked protocol? I'd document that in the patch.

#10

fgm - May 5, 2007 - 16:25
Status:patch (code needs review)» patch (code needs work)

One advantage of the chunked transfer encoding is a theoretical ability to handle long responses efficiently, by syncing the HTTP level blocks with TCP-level reads, effectively adding minor synchronization points to TCP, which theoretically can make it more efficient: reads can be made to the announced chunk size instead of being to the default block size (see the "fetch response" part in drupal_http_header) and, if the underlying network supports it, will therefore be processed at the optimal speed. In addition, since the content length is known in advance from the chunk length, the end of socket read needn't be done using feof($fp), which relies on the protocol closing sequence and therefore introduces at least one latency iteration, but can close directly when the exact length of data has been read, since it is known in advance. See http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.6.1 for details.

The patch does not seem to take into account the optional final headers which may appear after the data when a Transfer-Encoding is defined as under the trailer BNF production for chunked encoding.

See http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.4.6 for a recommended way of processing such encodings.

FWIW, I just encountered the problem today, where a Wordpress blog would return Transfer-Encoding: chunked to drupal_http_request in D5 and D4.7, although the request is send under HTTP/1.0 and such encodings are not allowed for HTTP/1.0 agents (as per section 3.6: A server MUST NOT send transfer-codings to an HTTP/1.0 client). Maybe someone familiar with WP should create an issue on their site too.

#11

Pancho - January 9, 2008 - 14:16

Any progress here?

 
 

Drupal is a registered trademark of Dries Buytaert.