If an XML-RPC response contains a multibyte utf-8 character, the sent content-length doesn't match the actual length of the content.
This causes shortened/corrupted XML files, client-side (at least with some client classes).

Solution:
in xmlrpc_server_output() from /includes/xmlrpcs.inc, replace header('Content-Length: '. strlen($xml)); by header('Content-Length: '. drupal_strlen($xml)+0);

A little question: the patch doesn't work without +0, does anybody know why?

Greetings,
Pieter

Comments

c960657’s picture

Status: Needs review » Needs work

According to the HTTP specification, Content-Length is the length of the body in octets (i.e. bytes), not in characters:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13

pieterdc’s picture

This is relatively new to me, so I searched a little on the internet.

And as a UTF-8 encodes each Unicode character as a variable number of 1 to 4 octets (see: http://www.utf-8.com/), we can't just count the number of characters in a string with mb_string or drupal_strlen and assume it to be the size in bytes.

What should we use then?

Comments (#77040, #47309) on the php mb_string documentation page tell us to use:

 mb_strlen($utf8_string, '8bit');
//or
mb_strlen($utf8_string, 'latin1'); 

So, we'd have to create a function like drupal_strlen but then called drupal_strsize ... I guess. Let's give it a try.

function drupal_strsize($text) {
  global $multibyte;
  if ($multibyte == UNICODE_MULTIBYTE) {
    // count bytes
    return mb_strlen($text, 'latin1');
  }
  else {
    return strlen($text);
  }
}
c960657’s picture

strlen() counts octets and should be fine, unless mbstring.func_overload is enabled and that is not supported by Drupal. Did you perhaps enable zlib.output_compression?

pieterdc’s picture

Thanks c960657 for your quick response!

When I look at mysite /admin/reports/status/php
mbstring.func_overload is set to 0
zlib.output_compression is set to Off
So, I guess both are disabled.

Any further tips on how I could debug this?

My test (xml) message contains a '€' and a 'ê' ...

pieterdc’s picture

To answer a question out of my original post.

header('Content-Length: '. drupal_strlen($xml)+0); is apparently the same as header(0); which leaves the 'Content-Length' unset...

Source: "Strings may be concatenated using the '.' (dot) operator. Note that the '+' (addition) operator will not work for this." (http://php.net/manual/en/language.types.string.php)

I have double-checked this by removing header('Content-Length... from xmlrpcs.inc and I noticed it still worked (as with my hack).

But I know this isn't recommended and I'd really like to find a (decent) solution.

c960657’s picture

There is an online XML-RPC debugger on http://gggeek.raprap.it/debugger/ that you can use to inspect the server response.

pieterdc’s picture

It seems as if my client application can't handle the UTF-8 BOM...

My client application runs under character encoding ISO-8859-1, but changing that with a php header().. doesn't magically solve the problem.
Updating to the last version of Zend Framework, neither.
It must be something (else) specific to that application, because there are other client webapps that work fine, with the same webservices.

Setting header('Content-Length: '. (strlen($xml)+3)); in xmlrpcs.inc however, does "fix" it :s
Ps: a UTF-8 BOM is 3 bytes/octets long...

c960657’s picture

Where does the BOM come from? Drupal doesn't send one AFAIK. Did you by accident add one to one of your source files? That would explain why Content-Length doesn't match the actual length of the output.

As a quick-fix you can try adding ob_clean() to the top of xmlrpc_server_output().

pieterdc’s picture

Your proposed quick fix does work, c960657.

Next question is indeed: where does the BOM come from?

c960657’s picture

If you have shell access to the server, try this (found here):
grep -rl $'\xEF\xBB\xBF' .

pieterdc’s picture

Category: bug » support
Status: Needs work » Fixed

I'm a little insecure about setting this issue to 'fixed', but yeah, c960657, if I meet you in Paris next September, I'll buy you a drink ;-)

I do have shell access to the server. I followed your link, searched a little further, as I was getting too many files with a BOM to choose from and then came across this one, adjusted it a little and tried:
fgrep -i $'\xEF\xBB\xBF' `find . -iname '*.module' -print`
which pointed me to one single module file of a custom module (written by a colleague).
I followed the other steps, explained by your link, on how to remove the BOM using vi.

And bingo!!
Now I have a patchless xmlrpcs.inc file.

Drupal-greetings from Belgium to Denmark.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.