I'm working on a project for a client that has their Drupal site translated into Arabic. Running zen_id_safe() on any Arabic string turns it into a single dash, as none of the characters are alphanumeric.

The fix is to use a different regexp that uses \p instead of [a-z...] and to use the /u flag, like this:

return strtolower(preg_replace('/[^\p{L}\p{N}]+/u', '-', $string));

This is working for me so far, though I'd like to know if an even better regexp exists for this task.

Comments

samlerner’s picture

This is a relatively easy bug to fix, is this regexp worth a patch?

avpaderno’s picture

Title: zen_id_safe fails when all characters are non-Latin » zen_id_safe() fails when all characters are non-Latin

The function should try to find the equivalent Latin character; I am not sure it's always possible.

avpaderno’s picture

There is already a module that translate not Latin characters into Latin characters; as the translation is not required by all Drupal installations, maybe the theme can used that module when it is installed.

It will be up to the administrator to decide if he needs that module, which is already used by a third-party module.

avpaderno’s picture

The module I was referring to is Transliteration (http://drupal.org/project/transliteration).

Anyway, I would rather avoid to use non Latin characters for CSS IDs. It can see a too heavy limitation, but already there are only Latin characters in HTML (and even a subset of them).

samlerner’s picture

Anyway, I would rather avoid to use non Latin characters for CSS IDs. It can see a too heavy limitation, but already there are only Latin characters in HTML (and even a subset of them).

Is there any technical reason for this? If it's a preference, that's fine, but I ran into this issue while working on a site in Arabic in which I had to automatically wrap <h2> tags with an <a name="..."> to create anchor links. The name was the same as the <h2> content, hence the need for Arabic anchor names, which I was using zen_id_safe() to create.

If there is a technical reason to avoid non-Latin anchor names, I'd like to know so I can create a workaround. They work fine on FF3, Safari 3, and IE7/8 from what I've tested.

avpaderno’s picture

The fact there are compatibility problems is already a reason to avoid non-Latin characters.

The function the Zen theme implements is clearly not thought to be used for characters outside the ASCII character set, as it simply removes the characters outside that range.
It could implement the same code as Transliteration (or use that module) but would the CSS IDs be clear and still readable (especially with scripts like the Arabian one)?
if also the function would use the Unicode hexadecimal code rather than the Unicode character, would the CSS IDs still be readable?

What I suggested is only a personal preference that in the specific case would resolve the issue you are seeing. There is no technical reason, except the problem some characters causes to some browsers (otherwise the Zen theme would not implement zen_id_safe() at all).

avpaderno’s picture

Category: bug » support

I think that the function follows some directives for the allowed characters in a CSS ID; I would not talk of bug, in this case.

avpaderno’s picture

Status: Needs review » Fixed
johnalbin’s picture

Title: zen_id_safe() fails when all characters are non-Latin » Add zen_class_safe() to work with non-Latin class names
Version: 5.x-1.2 » 6.x-2.x-dev
Category: support » feature
Status: Fixed » Active

Indeed, zen_id_safe() references http://www.w3.org/TR/html4/types.html#type-name

Which shows that IDs “must begin with a letter ([A-Za-z]) and may be followed by any number of letters, digits ([0-9]), hyphens ("-"), underscores ("_"), colons (":"), and periods (".").”

The valid characters for a class are slightly different however: http://www.w3.org/TR/CSS21/syndata.html#characters

So, it would probably be good to have a class-specific function.

avpaderno’s picture

+1 for the feature.

johnalbin’s picture

Component: Code » PHP Code
johnalbin’s picture

Status: Active » Fixed

Zen 6.x-2.x now includes a copy of Drupal 7's drupal_html_class which does not have the bug with stripping out valid UTF-8 characters.

http://drupalcode.org/viewvc/drupal/contributions/themes/zen/template.ph...

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.