If an XML-RPC response contains a multibyte utf-8 character, the sent content-length doesn't match the actual length of the content.
This causes shortened/corrupted XML files, client-side (at least with some client classes).
Solution:
in xmlrpc_server_output() from /includes/xmlrpcs.inc, replace header('Content-Length: '. strlen($xml)); by header('Content-Length: '. drupal_strlen($xml)+0);
A little question: the patch doesn't work without +0, does anybody know why?
Greetings,
Pieter
Comments
Comment #1
c960657 commentedAccording to the HTTP specification, Content-Length is the length of the body in octets (i.e. bytes), not in characters:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13
Comment #2
pieterdcThis is relatively new to me, so I searched a little on the internet.
And as a UTF-8 encodes each Unicode character as a variable number of 1 to 4 octets (see: http://www.utf-8.com/), we can't just count the number of characters in a string with mb_string or drupal_strlen and assume it to be the size in bytes.
What should we use then?
Comments (#77040, #47309) on the php mb_string documentation page tell us to use:
So, we'd have to create a function like drupal_strlen but then called drupal_strsize ... I guess. Let's give it a try.
Comment #3
c960657 commentedstrlen() counts octets and should be fine, unless mbstring.func_overload is enabled and that is not supported by Drupal. Did you perhaps enable zlib.output_compression?
Comment #4
pieterdcThanks c960657 for your quick response!
When I look at mysite /admin/reports/status/php
mbstring.func_overload is set to 0
zlib.output_compression is set to Off
So, I guess both are disabled.
Any further tips on how I could debug this?
My test (xml) message contains a '€' and a 'ê' ...
Comment #5
pieterdcTo answer a question out of my original post.
header('Content-Length: '. drupal_strlen($xml)+0);is apparently the same asheader(0);which leaves the 'Content-Length' unset...Source: "Strings may be concatenated using the '.' (dot) operator. Note that the '+' (addition) operator will not work for this." (http://php.net/manual/en/language.types.string.php)
I have double-checked this by removing header('Content-Length... from xmlrpcs.inc and I noticed it still worked (as with my hack).
But I know this isn't recommended and I'd really like to find a (decent) solution.
Comment #6
c960657 commentedThere is an online XML-RPC debugger on http://gggeek.raprap.it/debugger/ that you can use to inspect the server response.
Comment #7
pieterdcIt seems as if my client application can't handle the UTF-8 BOM...
My client application runs under character encoding ISO-8859-1, but changing that with a php header().. doesn't magically solve the problem.
Updating to the last version of Zend Framework, neither.
It must be something (else) specific to that application, because there are other client webapps that work fine, with the same webservices.
Setting
header('Content-Length: '. (strlen($xml)+3));in xmlrpcs.inc however, does "fix" it :sPs: a UTF-8 BOM is 3 bytes/octets long...
Comment #8
c960657 commentedWhere does the BOM come from? Drupal doesn't send one AFAIK. Did you by accident add one to one of your source files? That would explain why Content-Length doesn't match the actual length of the output.
As a quick-fix you can try adding ob_clean() to the top of xmlrpc_server_output().
Comment #9
pieterdcYour proposed quick fix does work, c960657.
Next question is indeed: where does the BOM come from?
Comment #10
c960657 commentedIf you have shell access to the server, try this (found here):
grep -rl $'\xEF\xBB\xBF' .Comment #11
pieterdcI'm a little insecure about setting this issue to 'fixed', but yeah, c960657, if I meet you in Paris next September, I'll buy you a drink ;-)
I do have shell access to the server. I followed your link, searched a little further, as I was getting too many files with a BOM to choose from and then came across this one, adjusted it a little and tried:
fgrep -i $'\xEF\xBB\xBF' `find . -iname '*.module' -print`which pointed me to one single module file of a custom module (written by a colleague).
I followed the other steps, explained by your link, on how to remove the BOM using vi.
And bingo!!
Now I have a patchless xmlrpcs.inc file.
Drupal-greetings from Belgium to Denmark.