Posted by Pisco on March 29, 2011 at 8:42pm
| Project: | Drupal core |
| Version: | 8.x-dev |
| Component: | theme system |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | needs work |
| Issue tags: | CSS |
Issue Summary
As described in the referenced spec, the underscore is a perfectly valid and widely used character in CSS identifiers. drupal_clean_css_identifier should not transliterate underscores by default! Even the comments in the code say that the underscore is a valid character.
Comments
#1
It also appears as though the function was written towards the CSS specification, though the intended target is HTML!
It is also not usable for IDs and Names even though the PHPdoc indicates this.
AFAIK we should pick http://www.w3.org/TR/html401/ as specification. While CSS specification prohibits certain literals, their escape sequences can still be used.
Of course, the output of a strict function should be passed through check_plain as well before insertion into HTML.
To illustrate, see http://heine.familiedeelstra.com/html-css-spec-classes
#2
Underscore is traditionally escaped, for legacy reasons that do not apply to nowadays browsers.
We could consider changing that, but that's D8 material.
#3
The function most definitely seems to have been written towards the CSS specification, that's what the function name
css_identifiersuggests, and also what the documentation very clearly and unambiguously states.I would not rely on the 12 year old HTML 4.01 spec. Even if you wish to be as lax as possible, I'd recommend commplying to a common denominator of both the HTML and CSS specification. It seems as if the CSS spec has stricter requirements, still being a subset of the other, that's why I'd recommend complying to the CSS requirements.
On a side node: the current HTML spec seems to be similar to the HTML 4.01 spec mentioned by Heine. The only requirement for the
idattribute is, that the value must not contain any space characters. Similar for theclassattribute which must have a value that is a set of space-separated tokens.I'd most definitely change this in D8! From what the name and the documentation of drupal_clean_css_identifier suggests, and according the aforementioned specs, transliterating dashes to underscores is a bug.
#4
Off the record: this is how I noticed this issue: #939992: "CSS classes must be alphanumeric or dashes only." I think underscores should be included. This bug is already causing many problems to users, as popular CSS "frameworks" rely heavily on CSS identifiers having underscores. This means you have hack with workaround because Views relying on drupal_clean_css_identifier makes it impossible to use these CSS identifiers through the UI.
#5
I know that it was based on CSS, but it seems the intended target of the returned result is an HTML attribute. It doesn't make sense to base that on the CSS spec, does it?
"{foo" is a perfectly valid class attribute value. All it needs is an escape sequence to be used in a CSS selector.
#6
I got it, you're right!
This means that all drupal_clean_css_identifier has to do, is to escape this caracter sequence:
]]>.Extensible Markup Language (XML) 1.0 (Fifth Edition), 2.4 Character Data and Markup
#7
I think the code comments in the function explain why underscores are removed:
// By default, we filter using Drupal's coding standards.Drupal's coding standards prefer dashes to underscores (although honestly, I'm not sure that's written down anywhere)... so the function is just designed to help Drupal's HTML output be consistent with the standards by default.
And that seems to have been the explicit intent of #464862: Add drupal_css_class() to clean class names and rename form_clean_id (the issue where this was added).
Not to say we couldn't change Drupal's standards if we wanted to, of course :) But I think that does explain why the function currently behaves the way that it does.
#8
Relaxing the standards would actually help with issues like this: #936798: Dashboard uses unreliable method for identifying blocks.
Because currently, a whole bunch of characters get munged into dashes, so you can't go backwards from the CSS identifier to figure out what was fed into the function to produce it.
#9
Pisco, a CDATA section starts with
<![CDATA[. It is not the same as the CDATA SGML type (See 6.2 SGML basic types).#10
Heine, you're absolutely right, a
CDATAsectionstarts with<![CDATA[, but I was not taking aboutCDATA section. I was referring toCDATAas an attribute type.Unfortunately section 6.2 SGML basic types is only a summary of key information, I was not able to find a free version of the the SGML spec. I think, however, that it is save to stick to the XML and XHTML spec (2.4 Character Data and Markup):
Notice the distinction between the CDATA section and content of elements.
I think the start-delimiter of any markup is "
<", thus drupal_clean_css_identifier should do the following transliteration:<to<]]>to]]>'to'"to"#11
I'm afraid that is not going to work. The problem is that the use of drupal_html_class hasn't been thought about sufficiently.
In theme_table it is used as an 'attributes' array value, then later passed through check_plain via drupal_attributes.
The best we can do is IMO have the function that produces a "potentially valid attribute value" that is later passed through check_plain before being inserted into HTML. This also means we need to fix the theme's use of the classes strings.
#12
I have given this issue another thought: on one hand we have the HTML
class attributeas defined in the HTML 4.01 and the current HTML spec, on the other hand we have CSSclass identifiersas defined in the CSS spec. The HTML class attribute is a particularity of HTML, it has, on itself, nothing to do with CSS. CSS uses this particularity of HTML to add functionality on top of HTML, this is by design. These different specifications target different applications and impose different requirement on theclass attributeand theclass identifierrespectively. It is also by design that the requirements for one (HTML) is a subset of the other (CSS). This means that every CSSclass identifieris a valid HTMLclass attribute, but not every HTMLclass attributeis a valid CSSclass identifier!As said before, I think that drupal_clean_css_identifier is clearly targeted at CSS
class identifiers, and these are the requirement it should enforce:#13
This is what I propose:
<?php
function drupal_clean_css_identifier($identifier, $filter = array(' ' => '-')) {
// replace whitespaces
$identifier = strtr($identifier, $filter);
// CSS class identifiers must not start with
// - a digit
// - two hyphens
// - hyphen followed by a digit
$prev = '';
while ($prev != $identifier) {
$prev = $identifier;
$identifier = preg_replace(array('/^\d*/u', '/^\x{002D}*(?=\x{002D})/u', '/^\x{002D}\d/u'),'', $identifier);
}
// Valid characters in a CSS identifier are:
// - the hyphen (U+002D)
// - a-z (U+0030 - U+0039)
// - A-Z (U+0041 - U+005A)
// - the underscore (U+005F)
// - 0-9 (U+0061 - U+007A)
// - ISO 10646 characters U+00A1 and higher,
// We strip out any character not in the above list.
// as opposed to the spec we do not allow U+00A0
// as opposed to the spec we only allow code points up to U+FFFF
$identifier = preg_replace('/[^\x{002D}\x{0030}-\x{0039}\x{0041}-\x{005A}\x{005F}\x{0061}-\x{007A}\x{00A1}-\x{FFFF}]/u', '', $identifier);
return $identifier;
}
?>
See the attached example.
#14
Added a patch for Drupal 8.x, no unit tests yet.
#15
Added patch with unit tests.
#16
Nope. The CSS specification does not limit the allowed values of HTML classes, at least not to the extent you assume.
1foois a perfectly valid class attribute value, but not a valid CSS identifier. So what? To address the class 1foo in CSS, all you need to do is to create a valid CSS identifier. In this case:
.\31 foo#17
You're right. So back to #10 then, I think that's what's technically correct. Can you explain why you think this wouldn't work ( #11 )?
#18
drupal_html_class is sometimes used to produce the value of an attribute][class entry. These will be passed through check_plain, resulting in double escaping.
&fooShould become in HTML:
&fooBut will become in some cases
&amp;fooIt might be best for drupal_html_class to deliver a valid class value, then transform it to valid HTML via check_plain the moment it is inserted in the attribute value.
#19
The last submitted patch, drupal_clean_css_identifier-1109854-15.patch, failed testing.
#20
What exactly do you mean, I'd say that a valid class value is valid HTML.