If site has a non-English symbols language by default and Value filed in Custom variables field has a token it doesn't show in Google Analytics code.
If Name and Value have a non-Englysh symbols text it shows with broken encoding.
For example:
Name is User Roles, Value is [current-user:roles].
English language by default:

_gaq.push(['_setCustomVar', 1, "User Roles", "authenticated user, administrator", 2]);

Russian language by default:

_gaq.push(['_setCustomVar', 1, "User Roles", "", 2])

Chinese language by default:

_gaq.push(['_setCustomVar', 1, "User Roles", "\u8a3b\u518a\u4f7f\u7528\u8005, administrator"

German language by default:

_gaq.push(['_setCustomVar', 1, "User Roles", "Authentifizierter Benutzer, administrator", 2]);

If I set Name in Russian Роли пользователей and Value is анонимный пользователь. The code is:

_gaq.push(['_setCustomVar', 1, "\u0420\u043e\u043b\u0438 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u0439", "", 2]);

I don't find in GA help what all symbols have to be in English. GA works correct with non-English symbols (for example ecommerce tracking with russian symbols http://www.adlabs.ru/images/docs/yg_26.jpg)

Comments

hass’s picture

Thanks for report. I guess this may be caused by the url encoding. What does google analytics statistics interface show with German Sonderzeichen or rusdion chars?

The \u[code] doesn't mean it's wrong. This are just the javascript unicode chars. Are the chars wrong on the ga statistics and if they are, how? Can you debug the code, please?

hass’s picture

Status: Active » Postponed (maintainer needs more info)
plazik’s picture

I set Name as Роли пользователей ä, ö, ü, ß. The GA interface don't show it in Custom Variables report.
The \u[code] are show correct in GA Debuger.

plazik’s picture

StatusFileSize
new71.38 KB

I made a test.html and put Роли пользователей ä, ö, ü, ß manual. GA Debug shows Custom Variables correct.
Let's wait for tomorrow to see how it shows it GA interface.

plazik’s picture

Status: Postponed (maintainer needs more info) » Active
StatusFileSize
new78.04 KB
new76.83 KB

I did some test and here is the summary:

  1. The \u[code] work correct. GA can understand it.
  2. Name and Value must be specified both. If Value empty in code it doesn't work. This was the main problem. GA module should check it in admin interface.

Screenshots show my test code:

_gaq.push(['_setCustomVar', 1, "\u0420\u043e\u043b\u0438 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u0439 \u00e4, 2 \u00f6, \u00fc, \u00df", "\u0420\u043e\u043b\u0438 \u043f\u043e\u043b\u044c\u0437\u043e\u0432\u0430\u0442\u0435\u043b\u0435\u0439 \u00e4, 2 \u00f6, \u00fc, \u00df", 2])

P.S. Name should renamed as Key like in GA interface.

hass’s picture

Category: bug » support
Status: Active » Fixed

Normally the modules checks if the value or key is empty. If the token value is empty the custom var is not added to the content. This is the way how it's designed by google and the module. Nothing wrong.

If this is not the way how it works it's a bug.

Per docs the "key" was named "name" in past... I have just followed the google docs. We need to check if this has changed in wording and than we can change to key, too.

plazik’s picture

Category: support » bug
Status: Fixed » Needs review
StatusFileSize
new1014 bytes

Even I put these chars manual in GA module they don't show in code. The token value is correct. It's a bug.
I made a patch which fixed it.

Ok, GA docs have still named it as "name".

hass’s picture

Status: Needs review » Needs work

This cannot be a correct change and makes no sense to me, too. Google does not allow the string to be longer than 128 chars after url encoding.

Is there any bug in rawurldecode() with Russian letters?

plazik’s picture

StatusFileSize
new609 bytes

rawurldecode() works correct and make url for Russian chars like this:

%D0%A0%D0%BE%D0%BB%D0%B8%20%D0%BF%D0%BE%D0%BB%D1%8C%D0%B7%D0%BE%D0%B2%D0%B0%D1%82%D0%B5%D0%BB%D0%B5%D0%B9%20%C3%A4%2C%20%C3%B6%2C%20%C3%BC%2C%20%C3%9F150

%0..9A..F removes in substr function. That's why all chars disappeared. It it nessesary to do it? It should be something better solution.

hass’s picture

Priority: Normal » Major

This sounds like an important bug. The idea behind was that we do not log a broken/half hex key of a letter. I think it should just trim the string to the closest hex value if there is one, not remove all. We need to write a test for this, too.

hass’s picture

This does not work, too. If the last characters are a broken/half unicode hex value (%C3) the string becomes null by drupal_json_encode().

hass’s picture

I have a fix ready based on Drupal API, but just in case we may need any other solution later I'm posting a regex that seems working, too. Found this at php - Remove non-utf8 characters from string - Stack Overflow + two other workarounds that work with iconv() with //IGNORE and mb_convert_encoding() like drupal_convert_to_utf8() does except the //IGNORE in iconv() that seem to be superfluous if error reporting is suppressed with @iconv().

  $custom_var_value = preg_replace('/
    (
      (?: [\x00-\x7F]                 # single-byte sequences   0xxxxxxx
      |   [\xC0-\xDF][\x80-\xBF]      # double-byte sequences   110xxxxx 10xxxxxx
      |   [\xE0-\xEF][\x80-\xBF]{2}   # triple-byte sequences   1110xxxx 10xxxxxx * 2
      |   [\xF0-\xF7][\x80-\xBF]{3}   # quadruple-byte sequence 11110xxx 10xxxxxx * 3
      )+                              # ...one or more times
    )
  | .                                 # anything else
  /x', '$1', $tmp_value);
hass’s picture

Heck, the more test cases I'm trying the more issues I'm finding. Any suggestions or ideas?

        if ($name_length + $value_length > 128) {
          // Trim name and value to maximum combined length.
          $tmp_value = substr($tmp_value, 0, 127 - $name_length);
          $tmp_value = urldecode($tmp_value);
          // Silently remove non-utf8 characters created by trim of encoded url.
          $tmp_value = drupal_convert_to_utf8($tmp_value, 'utf-8');

          // FAIL: but we need something similar or we will see e.g. "%C" on end of strings
          // FAIL: Destroys strings that end with UPPER CASE chars.
          $tmp_value = rtrim($tmp_value, '%0..9A..F');

          // DOES NOT WORK PROPERLY IN ALL CASES e.g. "%C"
          $tmp_value = rtrim($tmp_value, '%');
          $tmp_value = rtrim($tmp_value);
          $custom_var_value = $tmp_value;
        }

Additional to this my Eclipse Egit or Git itself does not handle the Russian chars properly. After I commit this as a patch it looks destroyed.

    // Test whether invalid utf-8 tests may fail.
    $custom_vars = array(
      'slots' => array(
        1 => array(
          'slot' => 1,
          'name' => 'UTF8 test abc',
          'value' => 'Роли пользователей äöüß',
          'scope' => 3,
        ),
        2 => array(
          'slot' => 2,
          'name' => 'UTF8 test abcd',
          'value' => 'Роли пользователей äöüß',
          'scope' => 3,
        ),
        3 => array(
          'slot' => 3,
          'name' => 'Test',
          'value' => 'Роли пользователей äöüß',
          'scope' => 3,
        ),
        4 => array(
          'slot' => 4,
          'name' => 'Test',
          'value' => $this->randomName(255),
          'scope' => 3,
        ),
        5 => array(
          'slot' => 5,
          'name' => '',
          'value' => '',
          'scope' => 3,
        ),
      )
    );
    variable_set('googleanalytics_custom_var', $custom_vars);
    $this->verbose('<pre>' . print_r($custom_vars, TRUE) . '</pre>');

    $this->drupalGet('');

    // 'UTF8 test abc' + 'Роли пользователей äöüß' = 'Роли пользователе%C' = 'Роли пользователе'
    // 1. Trim HEX values
    // 2. Trim trailing spaces
    $this->assertRaw("_gaq.push(['_setCustomVar', 1, \"UTF8 test abc\", \"Роли пользователе\", 3]", '[testGoogleAnalyticsCustomVariables]: String trim 1 worked properly.');

    // 'UTF8 test abcd' + 'Роли пользователей äöüß' = 'Роли пользователей %' -> 'Роли пользователей'
    // 1. Trim %
    // 2. Trim spaces
    $this->assertRaw("_gaq.push(['_setCustomVar', 2, \"UTF8 test abcd\", \"Роли пользователей\", 3]", '[testGoogleAnalyticsCustomVariables]: String trim 2 worked properly.');

    // 'Test' + 'Роли пользователей äöüß' = 'Роли пользователей äö�' -> 'Роли пользователей äö'
    // 1. Remove invalid/broken utf-8 character e.g. %3C
    $this->assertRaw("_gaq.push(['_setCustomVar', 3, \"Test\", \"Роли пользователей äö\", 3]", '[testGoogleAnalyticsCustomVariables]: String trim 3 worked properly.');

    // 'Test' + Random string with 255 chars = 4 + 124 chars.
    // 1. Trim combinded name and value string length to max 128 characters.
    $this->assertRaw("_gaq.push(['_setCustomVar', 4, \"Test\", \"" . substr($site_slogan, 0, 123) . "\", 3]", '[testGoogleAnalyticsCustomVariables]: String trim 4 worked properly.');
hass’s picture

Status: Needs work » Needs review

Status: Needs review » Needs work
plazik’s picture

Status: Needs work » Needs review
StatusFileSize
new4.73 KB

Any suggestions or ideas?

I've found only solutions like yours.

After I commit this as a patch it looks destroyed.

The encoding of the file should be UTF-8 without BOM.

I changed encoding in your patch and fixed Russian chars.

hass’s picture

Status: Needs review » Needs work

Maybe we are able to build a regex for this use case and run preg_replace() to cut off the string at 128 chars or earlier.

hass’s picture

Issue summary: View changes

Fixed last code.

hass’s picture

Status: Needs work » Closed (won't fix)

I'm closing this now as the code has been removed in 2.x. I'm not sure if we are allowed to send unlimited stings in UA, but it's at least not documented that there is a limit like in ga.js.