When working with web services it is useful to use HTTP proxies (like http://mitmproxy.org/) to help debugging. However HTTP proxies use a special header format, like this:

HTTP/1.1 200 Connection established
Proxy-agent: mitmproxy 0.10
<empty line>
HTTP/1.1 200 OK
Date: Fri, 27 Dec 2013 13:58:25 GMT
Server: Apache/2.2.22 (Ubuntu)
. . . . . . . . . .

This format is required by the standards and it is not by mistake. However the module http_client does not recognize this format and is completely confused by it (does not know where the headers end and where the body starts). Fixing it would be useful.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

dashohoxha’s picture

Assigned: Unassigned » dashohoxha
Issue summary: View changes
dashohoxha’s picture

dashohoxha’s picture

Status: Active » Closed (fixed)
jsst’s picture

Assigned: dashohoxha » jsst
Status: Closed (fixed) » Needs work

This problem needs a better fix: the proposed solution only works for proxies that add the (non-standard) proxy-agent header. The headers of the real response will still end up in the response body when the proxy does not add any headers or adds headers other than proxy-agent.

See: https://curl.haxx.se/mail/lib-2005-10/0023.html

jsst’s picture

It is not possible to reliably split the response headers from the response body using CURLOPT_HEADERS=1, that's because we don't know how many header sections are contained in the response (i.e. did we go through a proxy?). The only component that can do that is libcurl and I know of two ways to leverage that information:

  • using CURLOPT_HEADERFUNCTION: that will work but is cumbersome because it parses individual header lines, which we would have to keep track of and glue together after parsing the response
  • using CURLINFO_HEADER_SIZE: that's the approach I took, we use curl to tell exactly where the actual response body begins, and then we split the headers from the body

I've attached a new patch which can be applied on top of the original patch in this thread - or straight onto the -dev version of this module. The change is backwards-compatible: it introduces a new optional parameter 'header_content' which when omitted (in case someone implemented a custom delegate) triggers the old behaviour of splitting the headers from the body assuming the request didn't go through a proxy server.

jsst’s picture

Status: Needs work » Needs review
jsst’s picture

I've unified patch #2 and #5 into one patch, and fixed a bug where HTTP 100 Continue responses were not properly handled.