In line 42-43 of includes/unicode.php, we have the following locale change:
// Set the standard C locale to ensure consistent, ASCII-only string handling.
setlocale(LC_CTYPE, 'C');
This causes charset conversion of strings using iconv() or mb_set_encoding() to fail.
For example, the following code outputs 'St?phanie,D H?rouville' instead of the expected 'Stephanie,D Herouville':
$string = "Stéphanie,D Hérouville";
$output = iconv("UTF-8", 'ASCII//TRANSLIT', $string);
print $output;
The workaround is to explicitly set the locale to a UTF8 locale prior to making a call to iconv() or mb_set_encoding(), and switch it back to 'C' immediately after.
I don't like how this is handled. I set my servers up with a certain locale, and that's what I expect it to be when I am coding.
We need to do one of two things:
- Document clearly that Drupal changes the locale at the beginning of each request.
- Change the functions which require the 'C' locale for ASCII-type string handling to set it themselves, and switch back after.
The big issue here is DX - I spent a couple hours trying different things and googling to find a solution to this. In hindsight, it's pretty clear, but for a less than perfect developer, this could be a cause of confusion / time waster.
Comments
Comment #1
brianV commentedAnother potential idea is to create a drupal_convert_string() function to convert between charsets for the user, and handle the locale changes automatically.
Comment #2
dave reidSomeone with more knowledge about string handling in PHP should comment here, but I think the problem is that we don't have full UTF-8 string support in PHP until PHP 6. Therefore things have to be non-UTF using 'C' in setlocale().
Comment #3
sunMarking as duplicate of
#614124: Bootstrap should reset locale settings
#1561214: Bootstrap sets C locale, but does not set UTF-8 character encoding