Currently some Cyrillic letter are transliterated not in very convenient way. For example 'г' transliterated as 'gh' meanwhile it was expected to be transliterated as 'g', 'х' as 'kh' when expected 'x' or just 'h' , 'е' as 'ie' when expected just 'e', 'ү' as 'u', some rarely but still used letters like 'ө', 'ң' are ignored completely. Above parities could be true for some languages which use Cyrillic, but for other languages look weird and sometimes are totally unacceptable. So my question is how could we tweak transliterations patterns by ourselves? If it is possible to set pares in admin section of Drupal site. If not please tell us how and where exactly we could hack the module to change the default transliteration behavior.

Comments

smk-ka’s picture

Category: feature » support

Transliteration supports language specific alterations, the following guide should help you adding them:

  1. first you have to find out the Unicode character code you want to change. Your example was 'г', which is 0x0433 (hexadecimal).
  2. The first two digits (ie. '04') tell you in which file the corresponding mapping belongs into. In this case it's data/x04.php.
  3. Open it in an editor and add your replacements to the array, for example:
    'language_code' => array(0x33 => 'g'),
    Two things are important here: First, language_code must be a valid code from Drupal's list of languages. Second, you have to keep only the last two digits of the character code (ie. 0x33, since the other two are already encoded in the file name). Remember to use hexadecimal notation everywhere.

Also take a look at data/x00.php since it already contains a bunch of language specific replacements. If you think your overridden replacements are useful for others please create and file a patch.

yngens’s picture

smk-ka,

Thank you very much for your reply. I am currently trying to tweak it following your instructions and have couple of questions.

1. I don't know where exactly I need to place such a code like 'language_code' => array(0x33 => 'g'),
The files format is as follows:

return array(
 	
  'en' => array('Ie', 'Io', 'Dj', 'Gj', 'Ie', 'Dz', 'I', 'Yi', 'J', 'Lj', 'Nj', 'Tsh', 'Kj', 'I', 'U', 'Dzh',
    'A', 'B', 'V', 'G', 'D', 'Ie', 'Zh', 'Z', 'I', 'I', 'K', 'L', 'M', 'N', 'O', 'P',
    'R', 'S', 'T', 'U', 'F', 'Kh', 'Ts', 'Ch', 'Sh', 'Shch', '', 'Y', '\'', 'E', 'Iu', 'Ia',
    'a', 'b', 'v', 'gh', 'd', 'ie', 'zh', 'z', 'i', 'i', 'k', 'l', 'm', 'n', 'o', 'p',
    'r', 's', 't', 'u', 'f', 'kh', 'ts', 'ch', 'sh', 'shch', '', 'y', '\'', 'e', 'iu', 'ia',
    'ie', 'io', 'dj', 'gj', 'ie', 'dz', 'i', 'yi', 'j', 'lj', 'nj', 'tsh', 'kj', 'i', 'u', 'dzh',
    'O', 'o', 'E', 'e', 'Ie', 'ie', 'E', 'e', 'Ie', 'ie', 'O', 'o', 'Io', 'io', 'Ks', 'ks',
    'Ps', 'ps', 'F', 'f', 'Y', 'y', 'Y', 'y', 'u', 'u', 'O', 'o', 'O', 'o', 'Ot', 'ot',
    'Q', 'q', '*1000*', '', '', '', '', NULL, '*100.000*', '*1.000.000*', NULL, NULL, '"', '"', 'R\'', 'r\'',
    'G\'', 'g\'', 'G\'', 'g\'', 'G\'', 'g\'', 'Zh\'', 'zh\'', 'Z\'', 'z\'', 'K\'', 'k\'', 'K\'', 'k\'', 'K\'', 'k\'',
    'K\'', 'k\'', 'N\'', 'n\'', 'Ng', 'ng', 'P\'', 'p\'', 'Kh', 'kh', 'S\'', 's\'', 'T\'', 't\'', 'U', 'u',
    'U\'', 'u\'', 'Kh\'', 'kh\'', 'Tts', 'tts', 'Ch\'', 'ch\'', 'Ch\'', 'ch\'', 'H', 'h', 'Ch', 'ch', 'Ch\'', 'ch\'',
    '`', 'Zh', 'zh', 'K\'', 'k\'', NULL, NULL, 'N\'', 'n\'', NULL, NULL, 'Ch', 'ch', NULL, NULL, NULL,
    'a', 'a', 'A', 'a', 'Ae', 'ae', 'Ie', 'ie', '@', '@', '@', '@', 'Zh', 'zh', 'Z', 'z',
    'Dz', 'dz', 'I', 'i', 'I', 'i', 'O', 'o', 'O', 'o', 'O', 'o', 'E', 'e', 'U', 'u',
    'U', 'u', 'U', 'u', 'Ch', 'ch', NULL, NULL, 'Y', 'y', NULL, NULL, NULL, NULL, NULL),
	
	);

So, i only could change replacing 'gh' to 'g' directly in the above raw, not by placing 'language_code' => array(0x33 => 'g'), anywhere.

2. It only works with 'en' operand. Even though I tried to use such valid Drupal denominations (language_code) like 'ru', 'kz', 'uz', 'ky' it always gives this error:

Fatal error: Unsupported operand types in /home/mysite/public_html/sites/all/modules/transliteration/transliteration.inc on line 203

3. If the second question above will be resolved, then should put transliteration arrays for all three languages in one file or how else this works?

Thank you!

yngens’s picture

Also it would be nice, to have real UTF8 parities in those php files, because after finding the right file per your instructions it is not clear how to define which character is a pair of the letter to be modified. For example, currently Cyrillic character 'ө' has been replaced by Latin 'o', and after finding the correct data file I don't know which of several 'o's and 'O's are pair of 'ө'. Of course, this problem is true in case of direct edit of a given array. The method of writing correct pair like 'language_code' => array(0x33 => 'g'), would ease the problem, but as I put above I don't know where to put this code.

illuminaut’s picture

would be nice to have an admin interface to change/add transliterations.

Eugene Fidelin’s picture

I also want to ciorrrect cyrilic transliteration but don't know where to do it

smk-ka’s picture

Status: Active » Closed (fixed)

The 3.x branch has a new and (hopefully) simpler way to add overrides, plus a section in the README dedicated to adding language specific variants.

yngens’s picture

Issue summary: View changes

spelling correction