Closed (fixed)
Project:
TableField
Version:
7.x-2.x-dev
Component:
Code
Priority:
Minor
Category:
Task
Assigned:
Unassigned
Reporter:
Created:
1 Jun 2012 at 04:19 UTC
Updated:
15 Jun 2012 at 04:21 UTC
I needed to use the import cyrillic file. There was a problem with the encoding, because that is the standard means to recognize the p1251 is quite difficult. Here is my solution.
// function tablefield_import_csv
foreach ($csv as $col_id => $col) {
//detect encoding, mb_detect_encoding not working here
$encoding = tablefield_detect_encoding($col);
$col = mb_convert_encoding($col, "UTF-8", $encoding);
$form_state['input'][$field_name][$language][$delta]['tablefield']['cell_' . $row_count . '_' . $col_id] =
$form_state['values'][$field_name][$language][$delta]['tablefield']['cell_' . $row_count . '_' . $col_id] = $col;
And function (author Chechetkin Dmitrii)
// http://forum.dklab.ru/viewtopic.php?t=37830 (russian article)
function tablefield_detect_encoding($string, $pattern_size = 50)
{
$list = array('cp1251', 'utf-8', 'ascii', '855', 'KOI8R', 'ISO-IR-111', 'CP866', 'KOI8U');
$c = strlen($string);
if ($c > $pattern_size) {
$string = substr($string, floor(($c - $pattern_size) /2), $pattern_size);
$c = $pattern_size;
}
$reg1 = '/(\xE0|\xE5|\xE8|\xEE|\xF3|\xFB|\xFD|\xFE|\xFF)/i';
$reg2 = '/(\xE1|\xE2|\xE3|\xE4|\xE6|\xE7|\xE9|\xEA|\xEB|\xEC|\xED|\xEF|\xF0|\xF1|\xF2|\xF4|\xF5|\xF6|\xF7|\xF8|\xF9|\xFA|\xFC)/i';
$mk = 10000;
$enc = 'ascii';
foreach ($list as $item){
$sample1 = @iconv($item, 'cp1251', $string);
$gl = @preg_match_all($reg1, $sample1, $arr);
$sl = @preg_match_all($reg2, $sample1, $arr);
if (!$gl || !$sl) continue;
$k = abs(3 - ($sl / $gl));
$k += $c - $gl - $sl;
if ($k < $mk) {
$enc = $item;
$mk = $k;
}
}
return $enc;
}
May be it's helps anybody. May be you know more better method?
Comments