Download & Extend

Adding support to display non-unicode characters

Project:BAWStats
Version:6.x-1.1
Component:Code
Category:bug report
Priority:minor
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

I realized that the module doesn't currently support to display non-unicode characters in statistics. This is because the filter_xss function doesn't support any input, which isn't in UTF-8.
Symptom: The string returned by baw_display_drupal is empty when using firefox or chromium on an x86_64 archlinux machine. This causes that the statistics are not displayed at all. Konqueror browser could display the statistics on the same machine. There was no problem using firefox on OSx as well.

Possible solution:
Encode $content to utf-8 before using filter_xss in includes/bawstats.stats.inc. E.g:

<?php
$content
.=  "</div>";
//required to be compatible with languages using non-unicode characters
$content = mb_convert_encoding($content, 'UTF-8', 'ISO-8859-1');
return
filter_xss($content, array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd', 'table', 'td', 'tr', 'th', 'div', 'img', 'br', 'h1', 'h2', 'h3'));
?>

Currently tested using hungarian translation.

Comments

#1

Hello shafter,

sorry for the delay in responding.

I've one main query about this - how can we be sure that we should convert from ISO-8859-1 rather than some other encoding?

Also, I think it would be better to use the API function drupal_convert_to_utf8() rather than mb_convert_encoding().

If you can address the first question, I'd be happy to put this modification into the module. Even better would be if you could submit a patch :-)

#2

Status:active» patch (to be ported)

This is a hard question, as there are more than 10 ISO charsets, actually our chars are in ISO-8859-2, I usually use that one, but compatible with -1 too.
Otherwise, you should try using mb_detect_encoding($str) to detect the encoding before converting. Feel free to add more charsets to the list, if you want your module to be compatible with those too (e.g. chinese, russian chars).

<?php
 
if (function_exists('mb_detect_encoding')) {
   
$content = drupal_convert_to_utf8($content, mb_detect_encoding($content, 'UTF-8, ISO-8859-1, ISO-8859-2'));
  }
  else {
   
$content = drupal_convert_to_utf8($content, 'ISO-8859-1');
  }
?>

Sorry, I forgot to include the path of the bawstat.stats.inc file in the patch. Please run the patch in the includes directory.

AttachmentSize
bawstats.patch 586 bytes

#3

Status:patch (to be ported)» fixed

Thanks for this shafter. I wonder if we need to add in more character sets from to mb_detect_encoding(), but I think this is OK for now, and I've committed it.

#4

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

#5

Status:closed (fixed)» patch (to be ported)

Reopening, because apparently the patch wasn't ported correctly in the latest version, probably because I forgot to include the path of the include file in the patch. Attached a slightly modified version with the correct path now. Sorry.
By the way, we experienced various results on separate platforms regarding this problem. Apparently, the issue doesn't appears on OSX or under Wine (on Windows), but appears on Archlinux_64, and Ubuntu latest. The issue seems to be client platform dependent.

AttachmentSize
utfconversion.patch 787 bytes

#6

Status:patch (to be ported)» fixed

The patch seems to have been applied correctly.

#7

Status:fixed» closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

nobody click here