support for long locale name
voidoo - December 27, 2004 - 13:02
| Project: | Internationalization |
| Version: | HEAD |
| Component: | Code |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Jump to:
Description
I have using the i18n module to implement an english and chinese bilingual site. However, chinese got a long locale name "zh-hant" and "zh-hans". So I modify the i18n module to make it support locale longer then 2 chars.
| Attachment | Size |
|---|---|
| i18n-long-lang.patch.gz | 929 bytes |

#1
I am a bit confused with this. Aren't there zh-hant and zh-hants two variants of the same language?
I'm thinking of applying only this patch partially. I mean, I can see the use of language variants for interface translation, but not that clearly for node languages. I mean I see very limited use for this, unless you have some solid and configurable fall back mechanism... (?)
Also, this would raise some other problems, as currently any two letter code at the beginning of the path is taken as a language, wchich cannot be done if language code is of arbitrary length, so I'm afraid this would break some links when enabling/disabling i18n module.
Would it be a solution to use three letter ISO language codes? Currently the locale module uses 2 letter ones, but this could be worked out, I guess.
Please, comment on this.
#2
I think this needs to be reworked a bit since you can not always accurately identify a language by a two letter code. Generally a language_country_variant format is used. For example to distinguish between Chinese on the mainland (zh_CN, zh_Hans or just zh) and Chinese in Taiwan (zh_TW or zh_Hant) different variations are used. Note that there are subtle differences between these tags although in most cases they would mean the same thing. Any proper internationalization would need to identify the difference between these tags and also set them up in a hierarchy for fallback. So for the localization of a site in mainland chinese we would use zh and for the traditional taiwanese we would use zh_TW, which might have en as base followed by zh (perhaps 95% complete), and then zh_TW (with only those strings different from zh translated, perhaps 50%), these translations are then layered to provide a complete interface. The translation of nodes for internationalization should work in a similar manner, if the user want's zh_TW then fetch that node, if it does not exist then try for a zh and if that fails then skip to the base (in this case en), if that fails then an error, or any other translation of the node might be served instead.
In summary, it should be required to provide longer fields than two characters for language identification (probably up to 10 chars are needed), even if that means devising a new method to identify the selected language for a node. To always assume a two letter ident in the path string is not a good idea in the long term.
A three letter code would not make any significant difference, the module must be more flexible than that (no one said this was going to be easy ;)
I see the need for this functionality and would be willing to provide assistance if necessary.
for further information on language tags check out http://www.ietf.org/rfc/rfc3066.txt
BAB
#3
I've struggle with this issue for hours :-(
Don't know if Drupa will have problems with replacing those 2 long names with zh and zt?
#4
Ok, we should think then of some way of supporting these long locales without breaking that nice paths, like /en/....
Anyway, as I understand it, we need first to have in place a fallback mechanism, for all these language variations.
An also, would we need all these languages for content itself, or only for interface translation?
#5
Ok, added support for long locale names.
Now you can use any locale you want but this still lacks some consistent fallback mechanism
#6
Closing this for now, still pending that fallback mechanism, but this should be opened for discussion, maybe as a new 'feature request'