Adding support to display non-unicode characters
shafter - November 4, 2009 - 12:14
| Project: | BAWStats |
| Version: | 6.x-1.1 |
| Component: | Code |
| Category: | bug report |
| Priority: | minor |
| Assigned: | Unassigned |
| Status: | patch (to be ported) |
Description
I realized that the module doesn't currently support to display non-unicode characters in statistics. This is because the filter_xss function doesn't support any input, which isn't in UTF-8.
Symptom: The string returned by baw_display_drupal is empty when using firefox or chromium on an x86_64 archlinux machine. This causes that the statistics are not displayed at all. Konqueror browser could display the statistics on the same machine. There was no problem using firefox on OSx as well.
Possible solution:
Encode $content to utf-8 before using filter_xss in includes/bawstats.stats.inc. E.g:
<?php
$content .= "</div>";
//required to be compatible with languages using non-unicode characters
$content = mb_convert_encoding($content, 'UTF-8', 'ISO-8859-1');
return filter_xss($content, array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd', 'table', 'td', 'tr', 'th', 'div', 'img', 'br', 'h1', 'h2', 'h3'));
?>Currently tested using hungarian translation.

#1
Hello shafter,
sorry for the delay in responding.
I've one main query about this - how can we be sure that we should convert from ISO-8859-1 rather than some other encoding?
Also, I think it would be better to use the API function drupal_convert_to_utf8() rather than mb_convert_encoding().
If you can address the first question, I'd be happy to put this modification into the module. Even better would be if you could submit a patch :-)
#2
This is a hard question, as there are more than 10 ISO charsets, actually our chars are in ISO-8859-2, I usually use that one, but compatible with -1 too.
Otherwise, you should try using mb_detect_encoding($str) to detect the encoding before converting. Feel free to add more charsets to the list, if you want your module to be compatible with those too (e.g. chinese, russian chars).
<?phpif (function_exists('mb_detect_encoding')) {
$content = drupal_convert_to_utf8($content, mb_detect_encoding($content, 'UTF-8, ISO-8859-1, ISO-8859-2'));
}
else {
$content = drupal_convert_to_utf8($content, 'ISO-8859-1');
}
?>
Sorry, I forgot to include the path of the bawstat.stats.inc file in the patch. Please run the patch in the includes directory.