When using the filter 'limit allowed HTML tags' for text formats, the attribute xml:lang
transforms to lang
.
For example <blockquote xml:lang="en">
, output as <blockquote lang="en">
.
Note that when the 'Correct faulty and chopped off HTML' is turned on (default setting), it by surprise corrects to <blockquote lang="en" xml:lang="en">
(that is correct behavior as the use of both xml:lang and lang is recommended)
The same issue for for xml:id
, this will output as id
. (Unluckily 'Correct faulty and chopped off HTML' doesn't correct this one :).
Steps to reproduce:
1. install clean Drupal 8
3. add page with body text '<em xml:id="1234">test</em>
4. it will show as <em id="1234">test</em>
Comment | File | Size | Author |
---|---|---|---|
#3 | patch_commit_faadf30b7d1c.patch | 783 bytes | Hanno |
#1 | patch_commit_faadf30b7d1c.patch | 783 bytes | Hanno |
Comments
Comment #0.0
Hanno CreditAttribution: Hanno commentedminor typo
Comment #0.1
Hanno CreditAttribution: Hanno commented.
Comment #1
Hanno CreditAttribution: Hanno commentedDid some research and it happens in
_filter_xss_attributes($attr)
in common.inc,in particular in this regular expression: '/^([-a-zA-Z]+)/'.
Changing this to '/^([-a-zA-Z:]+)/' works.
Is this a safe solution? If ':' is only used with the combination 'xml:', this expression could be probably more narrowed? Patch provided but needs review.
Comment #3
Hanno CreditAttribution: Hanno commentedComment #3.0
Hanno CreditAttribution: Hanno commented.
Comment #3.1
Hanno CreditAttribution: Hanno commented.
Comment #4
Hanno CreditAttribution: Hanno commentedNot exclusively related to the filter module
Comment #5
Hanno CreditAttribution: Hanno commentedbetter title
Comment #6
Gábor HojtsySetting for testing of last patch.
Comment #7
Gábor HojtsyAlso tagging.
Comment #8
Gábor HojtsyDoes this even apply to Drupal 8? Drupal 8 now outputs HTML 5 (see http://groups.drupal.org/node/187004) and reading the spec (http://www.whatwg.org/specs/web-apps/current-work/#the-lang-and-xml:lang...), it says we should not use it in a forward looking manner.
However, the HTML 5 patch seems to have carried over xml:lang without adding lang. (At #1077566: Convert html.tpl.php to HTML5).
Comment #10
Gábor HojtsyI've opened #1330922: Drupal 8 HTML 5 page template should use lang not xml:lang for the xml:lang -> lang work/question.
Comment #11
Gábor HojtsyAll right, so seems like we are not going to use XHTML(5) in Drupal 8, so this will not apply there as far as I see. It does apply to Drupal 7 and 6 though.
Comment #12
Hanno CreditAttribution: Hanno commentedinteresting about html5 and the lang attribute.
While it make sense to only apply this for D7, in my opinion a filter for cross site scripting shouldn't alter attributes from XHTML-style to HTML5 style. Instead, that could be done in the Correct faulty HTML filter.
And there is a bug for the Correct faulty HTML filter. It changes
lang='xx'
tolang='xx' xml:lang='xx'
(i.e. it will print these both attributes). While this is correct XHTML for D7, from it doesn't appear correct html5 for D8 (see comment #8).EDIT: created an issue: #1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5
Comment #13
Hanno CreditAttribution: Hanno commentedrelated issues on filtered colon in filter_xss:
#1271014: Some strings remain untranslatable because of colons in HTML attributes (rdf tags)
#1196488: Allow checkout completion message to be translated
#928948: _filter_xss_attributes should ignore attribute values for "value" and "flashvars"
#372454: Filtered HTML filter modifies class attribute
#914084: more flexiblity with the id rules
Comment #14
Gábor HojtsyWell, yeah, depending on how you look at it. Security functions should filter data for the type of input/output expected (depending on where the filtering happens), and since we handle HTML5 in Drupal 8, we should set the tight filter for that, right? We don't really know what kind of namespaced values might cause issues for clients, so we set our whitelist to the tightest possible. I think that makes sense from a security point of view. I agree on the point of #1333730: [Meta] PHP DOM (libxml2) misinterprets HTML5 as well though and will definitely be needed for converting existing data.
Comment #15
Hanno CreditAttribution: Hanno commentedDo we know if contributed modules are using filter_xss for other filtering than html5? When views or other modules handle XML-related data using this filter, it might give unexpected results in some cases.
Curious to know if the colon isn't allowed by accident. Maybe this issue can be handled together with this issue: #928948: _filter_xss_attributes should ignore attribute values for "value" and "flashvars".
Comment #16
wanderingstan CreditAttribution: wanderingstan commentedJust my two cents, but I'm hitting this problem even in Drupal 6: I need to allow the "fb:profile-pic" tag, but filter_xss won't allow any tag with a colon in it's name. Seems this is a bug, as the HTML5 spec clearly allows for tags (and attribute) namespaces.
Comment #17
Hanno CreditAttribution: Hanno commentedWell, good to report another case. Found also this duplicate issue #952964: String stripped from title and alt attribute if contains a colon
Comment #18
wanderingstan CreditAttribution: wanderingstan commentedFor now I "fixed" the problem with a (gasp) patch to _filter_xss_split() in filter.module (D6), adding colons and hyphens as allowed characters. So "fb:profile-pic" is recognized as a tag name.
Line 1059:
Comment #19
iStryker CreditAttribution: iStryker commentedTo me, this will never get fixed. xml:XXX is not a safe attribute. See https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Preventio...
Comment #19.0
iStryker CreditAttribution: iStryker commentedsteps to reproduce added
Comment #20
Hanno CreditAttribution: Hanno commentedComment #21
joseph.olstaddisabling "Limit allowed HTML tags and correct faulty HTML" does avert this
With that said, I'd like to push this issue along and help resolve this for everyone.
Comment #22
joseph.olstadComment #23
joseph.olstadok some confusion on my part, xhtml spec says xml:lang is desired, but it is not wanted in HTML5
http://www.w3.org/TR/xhtml1/#C_7
I will create a new issue for D9.3.x if one doesn't already exist.