Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
As described in the referenced spec, the underscore is a perfectly valid and widely used character in CSS identifiers. drupal_clean_css_identifier should not transliterate underscores by default! Even the comments in the code say that the underscore is a valid character.
Comments
Comment #1
Heine CreditAttribution: Heine commentedIt also appears as though the function was written towards the CSS specification, though the intended target is HTML!
It is also not usable for IDs and Names even though the PHPdoc indicates this.
AFAIK we should pick http://www.w3.org/TR/html401/ as specification. While CSS specification prohibits certain literals, their escape sequences can still be used.
Of course, the output of a strict function should be passed through check_plain as well before insertion into HTML.
To illustrate, see http://heine.familiedeelstra.com/html-css-spec-classes
Comment #2
Damien Tournoud CreditAttribution: Damien Tournoud commentedUnderscore is traditionally escaped, for legacy reasons that do not apply to nowadays browsers.
We could consider changing that, but that's D8 material.
Comment #3
Pisco CreditAttribution: Pisco commentedThe function most definitely seems to have been written towards the CSS specification, that's what the function name
css_identifier
suggests, and also what the documentation very clearly and unambiguously states.I would not rely on the 12 year old HTML 4.01 spec. Even if you wish to be as lax as possible, I'd recommend commplying to a common denominator of both the HTML and CSS specification. It seems as if the CSS spec has stricter requirements, still being a subset of the other, that's why I'd recommend complying to the CSS requirements.
On a side node: the current HTML spec seems to be similar to the HTML 4.01 spec mentioned by Heine. The only requirement for the
id
attribute is, that the value must not contain any space characters. Similar for theclass
attribute which must have a value that is a set of space-separated tokens.I'd most definitely change this in D8! From what the name and the documentation of drupal_clean_css_identifier suggests, and according the aforementioned specs, transliterating dashes to underscores is a bug.
Comment #4
Pisco CreditAttribution: Pisco commentedOff the record: this is how I noticed this issue: #939992: "CSS classes must be alphanumeric or dashes only." I think underscores should be included. This bug is already causing many problems to users, as popular CSS "frameworks" rely heavily on CSS identifiers having underscores. This means you have hack with workaround because Views relying on drupal_clean_css_identifier makes it impossible to use these CSS identifiers through the UI.
Comment #5
Heine CreditAttribution: Heine commentedI know that it was based on CSS, but it seems the intended target of the returned result is an HTML attribute. It doesn't make sense to base that on the CSS spec, does it?
"{foo" is a perfectly valid class attribute value. All it needs is an escape sequence to be used in a CSS selector.
Comment #6
Pisco CreditAttribution: Pisco commentedI got it, you're right!
This means that all drupal_clean_css_identifier has to do, is to escape this caracter sequence:
]]>
.Extensible Markup Language (XML) 1.0 (Fifth Edition), 2.4 Character Data and Markup
Comment #7
David_Rothstein CreditAttribution: David_Rothstein commentedI think the code comments in the function explain why underscores are removed:
Drupal's coding standards prefer dashes to underscores (although honestly, I'm not sure that's written down anywhere)... so the function is just designed to help Drupal's HTML output be consistent with the standards by default.
And that seems to have been the explicit intent of #464862: Add drupal_css_class() to clean class names and rename form_clean_id (the issue where this was added).
Not to say we couldn't change Drupal's standards if we wanted to, of course :) But I think that does explain why the function currently behaves the way that it does.
Comment #8
David_Rothstein CreditAttribution: David_Rothstein commentedRelaxing the standards would actually help with issues like this: #936798: Dashboard uses unreliable method for identifying blocks.
Because currently, a whole bunch of characters get munged into dashes, so you can't go backwards from the CSS identifier to figure out what was fed into the function to produce it.
Comment #9
Heine CreditAttribution: Heine commentedPisco, a CDATA section starts with
<![CDATA[
. It is not the same as the CDATA SGML type (See 6.2 SGML basic types).Comment #10
Pisco CreditAttribution: Pisco commentedHeine, you're absolutely right, a
CDATA
section
starts with<![CDATA[
, but I was not taking aboutCDATA section
. I was referring toCDATA
as an attribute type.Unfortunately section 6.2 SGML basic types is only a summary of key information, I was not able to find a free version of the the SGML spec. I think, however, that it is save to stick to the XML and XHTML spec (2.4 Character Data and Markup):
Notice the distinction between the CDATA section and content of elements.
I think the start-delimiter of any markup is "
<
", thus drupal_clean_css_identifier should do the following transliteration:<
to<
]]>
to]]>
'
to'
"
to"
Comment #11
Heine CreditAttribution: Heine commentedI'm afraid that is not going to work. The problem is that the use of drupal_html_class hasn't been thought about sufficiently.
In theme_table it is used as an 'attributes' array value, then later passed through check_plain via drupal_attributes.
The best we can do is IMO have the function that produces a "potentially valid attribute value" that is later passed through check_plain before being inserted into HTML. This also means we need to fix the theme's use of the classes strings.
Comment #12
Pisco CreditAttribution: Pisco commentedI have given this issue another thought: on one hand we have the HTML
class attribute
as defined in the HTML 4.01 and the current HTML spec, on the other hand we have CSSclass identifiers
as defined in the CSS spec. The HTML class attribute is a particularity of HTML, it has, on itself, nothing to do with CSS. CSS uses this particularity of HTML to add functionality on top of HTML, this is by design. These different specifications target different applications and impose different requirement on theclass attribute
and theclass identifier
respectively. It is also by design that the requirements for one (HTML) is a subset of the other (CSS). This means that every CSSclass identifier
is a valid HTMLclass attribute
, but not every HTMLclass attribute
is a valid CSSclass identifier
!As said before, I think that drupal_clean_css_identifier is clearly targeted at CSS
class identifiers
, and these are the requirement it should enforce:Comment #13
Pisco CreditAttribution: Pisco commentedThis is what I propose:
See the attached example.
Comment #14
Pisco CreditAttribution: Pisco commentedAdded a patch for Drupal 8.x, no unit tests yet.
Comment #15
Pisco CreditAttribution: Pisco commentedAdded patch with unit tests.
Comment #16
Heine CreditAttribution: Heine commentedNope. The CSS specification does not limit the allowed values of HTML classes, at least not to the extent you assume.
is a perfectly valid class attribute value, but not a valid CSS identifier. So what? To address the class 1foo in CSS, all you need to do is to create a valid CSS identifier. In this case:
Comment #17
Pisco CreditAttribution: Pisco commentedYou're right. So back to #10 then, I think that's what's technically correct. Can you explain why you think this wouldn't work ( #11 )?
Comment #18
Heine CreditAttribution: Heine commenteddrupal_html_class is sometimes used to produce the value of an attribute][class entry. These will be passed through check_plain, resulting in double escaping.
Should become in HTML:
But will become in some cases
It might be best for drupal_html_class to deliver a valid class value, then transform it to valid HTML via check_plain the moment it is inserted in the attribute value.
Comment #20
Pisco CreditAttribution: Pisco commentedWhat exactly do you mean, I'd say that a valid class value is valid HTML.
Comment #21
aspilicious CreditAttribution: aspilicious commentedOk i'm confused. Can someone summarize why we can't change the filter from
$filter = array(' ' => '-', '_' => '-', '/' => '-', '[' => '-', ']' => ''))
to
$filter = array(' ' => '-', '/' => '-', '[' => '-', ']' => ''))
Views is creating its own wrazpper function just to remove that part of the filter.
And I can't find anything in the css standards about underscores.
Comment #22
dawehnerI have no glue of css, but there seems to be only one grid system which uses "_" as part of their css classes.
http://960.gs/
As far as i see, it's not that unpopular even the trend seems to move to other systems.
Comment #23
tim.plunkettWe're removing the wrapper in Views 8.x-3.x, and this seems to be the best place to discuss changes to the core function.
Comment #24
dawehnerAdding a tag,
Comment #25
fubhy CreditAttribution: fubhy commentedI opened a more or less related issue (which would be dependent on this considering that we agree we want it):
http://drupal.org/node/1900768
Comment #26
fubhy CreditAttribution: fubhy commentedSo... Our new CSS Syntax requires the _ to NOT be stripped. Ignore my last comment because BEM apparently was already decided on.
Take a look at http://drupal.org/node/1887918 to see an example of the planned syntax (especially @see "6. Formatting Class Names"). Thanks JohnAlbin, r5yn and anyone else involved in that draft... It REALLY looks amazing.
So... To make it short: We need underscores. But, more specifically, we only really need double underscores. Currently modules seem to use
drupal_html_class()
also to simply avoid running an additionalstrtr()
to replace underscores of machine-names before they are printed... Take node types for example.Do we want to run them through a regex replace to only replace single occurrences of underscores? Does that even make sense or should modules take care of that?
Comment #27
JohnAlbinSo adding code for _strict_ CSS class name validation is out. I had it as part of the original patch for
drupal_clean_css_identifier
. See #464862-12: Add drupal_css_class() to clean class names and rename form_clean_id But was nixed by Sun in comment #15 when he said:Comment #28
mbrett5062 CreditAttribution: mbrett5062 commentedHave read through all comments here, and AFAICT the final conclusion is that we should just remove the filtering of underscores from
drupal_clean_css_identifier
.This will allow class names for BEM such as .messages--error or .messages__icon.
It does not touch ID's as there is no need to change the current filtering. Also, as per #26, if modules are incorrectly using this function to filter and alter machine names, I think that should be their issue to resolve, provided core does not follow this bad practice.
AFAICS it does not, so no issue with this route.
Here is a patch, I would like to get this moving ASAP, so we can get onto cleaning the core HTML and CSS, either before or during Twig conversion.
Comment #29
mbrett5062 CreditAttribution: mbrett5062 commentedhhmm maybe I was wrong above. @fubhy did you mean that modules are replacing '_' with '-' in machine names for use in classes? If so, then maybe we could just change the filter to something like this:
Or would that not work? This needs a little more thought, I am not so great with regex.
Comment #31
mbrett5062 CreditAttribution: mbrett5062 commentedI believe that single underscore filter must remain in place, just adding a filter for double underscores, so they remain untouched. If this does not work, then we will need some kind of regex to get this right.
Comment #33
mbrett5062 CreditAttribution: mbrett5062 commentedOopss missed a comma, still not sure this will work, I need 2 things, first a regex to replace single underscore with a hyphen, then remove underscore from filter, then I need a test to be sure it is working. Will work on that next.
Comment #34
mbrett5062 CreditAttribution: mbrett5062 commentedAnd send to test bot.
Comment #35
mbrett5062 CreditAttribution: mbrett5062 commentedHaving read the PHP docs, I think my last patch will work as expected.
The longer string '__' will be matched first and replaced with '__' (itself), but then will not be touched again, so then the shorter '_' will be checked and matched and replaced with '-'.
From the docs:
The next example shows the behavior of strtr() when called with only two arguments. Note the preference of the replacements ("h" is not picked because there are longer matches) and how replaced text was not searched again.
Example #2 strtr() example with two arguments
The above example will output:
hello all, I said hi
Hope I have read that correctly.
Of course I know you PHP geniuses out there know this already, it's more for my reference and anyone not familiar to understand the change made.
Comment #36
gmclelland CreditAttribution: gmclelland commented@mbrett5062 - I guess I don't understand why are we not allowing single underscrores in drupal_clean_css_identifier? I found this issue after seeing my underscores where getting converted into hypens when adding classes in the views ui.
Comment #37
mbrett5062 CreditAttribution: mbrett5062 commentedOf course you are correct @gmclelland, I know of no reason why a single underscore can not be used, according to W3G standards, it was something introduced a long while ago in Drupal, and I am unable to find the reasoning.
However, according our new, upcoming standards for CSS, we are moving towards a SMACCS syntax, please see the following: (I pulled this out, so it is quicker to find, on an otherwise long page, Draft - CSS Architecture)
6. Formatting Class Names
The HTML class attribute has been made to do a lot. Drupal uses a naming convention to make clear what a particular class is for and what other parts of the system it affects:
While this does not preclude the use of single underscores, if we stick to the syntax as detailed, then everyone gets a clear understanding of the semantic inference within the structure.
I hope this helps.
Comment #38
xjmComment #39
jwilson3From reading the issue, it seems there are a couple reasons, which I'd like to just summarize all in one place, to have as a reference:
1) Legacy browser support (comment #2).
2) Hyphens are a de-facto unofficial Drupal standard, used for cleanliness, readability, and consistency (comment #7).
3) Perhaps a continuation of the previous point, but the new CSS Architecture draft doc makes no mention of the use of single underscores in class name formatting (comment #37).
4) Because of the way the function works, PHP Drupal developers have come to make their modules depend on the underscore to hyphen transliteration, which allows for interesting conversion such as
book_parent[bid][title]
intobook-parent-bid-title
(Sun's example). Changing this, will have far-reaching effects in both contrib and core, and themers will be left with the mess of digging up and converting previously working styles to insert underscores in random places, creating inconsistencies and disorder (excuse the FUD ;).Because of the wide-reaching effects that allowing underscores in class names will have (read: breaking lots of existing styles and scripts), and the relative innocuous addition of writing an exception for the double underscore and double hyphen situation... My suggestion is that we should spin off the double underscore / double hyphen exception to a separate issue, and leave this one just for the discussion of simply changing from
'_' => '-'
to'_' => '_'
.Comment #40
ry5n CreditAttribution: ry5n commentedI see no problem keeping the existing behaviour for single underscores as long as double underscores are preserved in the final output. #33 removes a blocker for #1921610: [Meta] Architect our CSS without the side effects described in #39 4).
I agree with @jwilson in #39, so I’ve opened #2009584: Allow double underscores to pass through drupal_clean_css_identifier as per new CSS standards and posted @mbrett5062’s patch from #33. I’d love to get a review and hopefully RTBC.
Comment #41
rteijeiro CreditAttribution: rteijeiro commentedI think this issue is fixed because #2009584: Allow double underscores to pass through drupal_clean_css_identifier as per new CSS standards is fixed, so RTBC. Feel free to reopen it if I am wrong.
Comment #42
JohnAlbinHaving #2009584: Allow double underscores to pass through drupal_clean_css_identifier as per new CSS standards lessens the need for this issue because it allows us to move forward with the proposed Drupal 8 class naming convention, but I think it is reasonable that other coders would want to use a different class naming scheme. And I don't really see a reason why Drupal should prohibit them from doing so. So I'm going to leave this open for that use-case.
Comment #44
KrisBulman CreditAttribution: KrisBulman commentedRerolled the patch from #33
Comment #48
joelpittetComment #49
isholgueras CreditAttribution: isholgueras commentedWorking on it
Comment #50
isholgueras CreditAttribution: isholgueras commentedI think that the function is being patched doesn't exist anymore in Drupal 8. Only exists in Drupal 7.
https://api.drupal.org/api/drupal/includes!common.inc/function/drupal_cl...
I looked for that functionality in the whole core, but I can't find it, so the reroll, I think, wont work.
Comment #51
isholgueras CreditAttribution: isholgueras commentedI found that this functionality is working in the file core/lib/Drupal/Component/Utility/Html.php:78
And the patch add this line:
I think it's ok and this issue could be closed.
Comment #52
isholgueras CreditAttribution: isholgueras commentedComment #53
mgiffordComment #54
sriharsha.uppuluri CreditAttribution: sriharsha.uppuluri at Azri Solutions commentedAs given in #51 added the line. Previous patches are not working.
Comment #55
sriharsha.uppuluri CreditAttribution: sriharsha.uppuluri at Azri Solutions commentedComment #57
joelpittetI think this was mostly fixed by #2541102: Get rid of strtr in Html::cleanCssIdentifier
Comment #58
jwilson3I dont suppose it makes sense to backport the part of #2541102 that fixed this bug to Drupal 7?
Comment #59
KrisBulman CreditAttribution: KrisBulman commentedThe patch in #44 works against D7
Comment #60
rootworkConfirmed that the patch in #44 still applies against 7.x. Given all the testing above, marking that patch as RTBC against D7.
Comment #61
mlncn CreditAttribution: mlncn at Agaric commentedConsider this a bump to get into Drupal 7...
... and a question about Drupal 8. I'm seeing Views in core replacing single underscores with dashes in class names, such as for rows. This is not expected behavior, correct?
Comment #62
Fabianx CreditAttribution: Fabianx as a volunteer and at Tag1 Consulting commentedClosed duplicate of #2009584: Allow double underscores to pass through drupal_clean_css_identifier as per new CSS standards, which has a variable to control the behavior, which is BC compatible.