If site has a non-English symbols language by default and Value filed in Custom variables field has a token it doesn't show in Google Analytics code.
If Name and Value have a non-Englysh symbols text it shows with broken encoding.
For example:
Name is User Roles, Value is [current-user:roles].
English language by default:
_gaq.push(['_setCustomVar', 1, "User Roles", "authenticated user, administrator", 2]);
Russian language by default:
_gaq.push(['_setCustomVar', 1, "User Roles", "", 2])
Chinese language by default:
_gaq.push(['_setCustomVar', 1, "User Roles", "\u8a3b\u518a\u4f7f\u7528\u8005, administrator"
German language by default:
_gaq.push(['_setCustomVar', 1, "User Roles", "Authentifizierter Benutzer, administrator", 2]);
If I set Name in Russian Роли пользователей and Value is анонимный пользователь. The code is:
_gaq.push(['_setCustomVar', 1, "\u0420\u043e\u043b\u0438 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u0439", "", 2]);
I don't find in GA help what all symbols have to be in English. GA works correct with non-English symbols (for example ecommerce tracking with russian symbols http://www.adlabs.ru/images/docs/yg_26.jpg)
| Comment | File | Size | Author |
|---|---|---|---|
| #16 | 1823202+Custom+variables+do+not+work+for+non-English+symbols-utf8.patch | 4.73 KB | plazik |
| #13 | 1823202+Custom+variables+do+not+work+for+non-English+symbols.patch | 4.51 KB | hass |
| #9 | googleanalytics-non-english-symbols2.patch | 609 bytes | plazik |
| #7 | googleanalytics-non-english-symbols.patch | 1014 bytes | plazik |
| #5 | Custom Variables - Key.png | 76.83 KB | plazik |
Comments
Comment #1
hass commentedThanks for report. I guess this may be caused by the url encoding. What does google analytics statistics interface show with German Sonderzeichen or rusdion chars?
The
\u[code]doesn't mean it's wrong. This are just the javascript unicode chars. Are the chars wrong on the ga statistics and if they are, how? Can you debug the code, please?Comment #2
hass commentedComment #3
plazik commentedI set
NameasРоли пользователей ä, ö, ü, ß. The GA interface don't show it in Custom Variables report.The
\u[code]are show correct in GA Debuger.Comment #4
plazik commentedI made a test.html and put
Роли пользователей ä, ö, ü, ßmanual. GA Debug shows Custom Variables correct.Let's wait for tomorrow to see how it shows it GA interface.
Comment #5
plazik commentedI did some test and here is the summary:
\u[code]work correct. GA can understand it.NameandValuemust be specified both. IfValueempty in code it doesn't work. This was the main problem. GA module should check it in admin interface.Screenshots show my test code:
P.S.
Nameshould renamed asKeylike in GA interface.Comment #6
hass commentedNormally the modules checks if the value or key is empty. If the token value is empty the custom var is not added to the content. This is the way how it's designed by google and the module. Nothing wrong.
If this is not the way how it works it's a bug.
Per docs the "key" was named "name" in past... I have just followed the google docs. We need to check if this has changed in wording and than we can change to key, too.
Comment #7
plazik commentedEven I put these chars manual in GA module they don't show in code. The token value is correct. It's a bug.
I made a patch which fixed it.
Ok, GA docs have still named it as "name".
Comment #8
hass commentedThis cannot be a correct change and makes no sense to me, too. Google does not allow the string to be longer than 128 chars after url encoding.
Is there any bug in rawurldecode() with Russian letters?
Comment #9
plazik commentedrawurldecode()works correct and make url for Russian chars like this:%0..9A..Fremoves insubstrfunction. That's why all chars disappeared. It it nessesary to do it? It should be something better solution.Comment #10
hass commentedThis sounds like an important bug. The idea behind was that we do not log a broken/half hex key of a letter. I think it should just trim the string to the closest hex value if there is one, not remove all. We need to write a test for this, too.
Comment #11
hass commentedThis does not work, too. If the last characters are a broken/half unicode hex value (
%C3) the string becomesnullbydrupal_json_encode().Comment #12
hass commentedI have a fix ready based on Drupal API, but just in case we may need any other solution later I'm posting a regex that seems working, too. Found this at php - Remove non-utf8 characters from string - Stack Overflow + two other workarounds that work with
iconv()with//IGNOREandmb_convert_encoding()likedrupal_convert_to_utf8()does except the//IGNOREin iconv() that seem to be superfluous if error reporting is suppressed with@iconv().Comment #13
hass commentedHeck, the more test cases I'm trying the more issues I'm finding. Any suggestions or ideas?
Additional to this my Eclipse Egit or Git itself does not handle the Russian chars properly. After I commit this as a patch it looks destroyed.
Comment #14
hass commentedComment #16
plazik commentedI've found only solutions like yours.
The encoding of the file should be UTF-8 without BOM.
I changed encoding in your patch and fixed Russian chars.
Comment #17
hass commentedMaybe we are able to build a regex for this use case and run preg_replace() to cut off the string at 128 chars or earlier.
Comment #17.0
hass commentedFixed last code.
Comment #18
hass commentedI'm closing this now as the code has been removed in 2.x. I'm not sure if we are allowed to send unlimited stings in UA, but it's at least not documented that there is a limit like in ga.js.