When using the filter 'limit allowed HTML tags' for text formats, the attribute xml:lang transforms to lang.

For example <blockquote xml:lang="en">, output as <blockquote lang="en">.

Note that when the 'Correct faulty and chopped off HTML' is turned on (default setting), it by surprise corrects to <blockquote lang="en" xml:lang="en"> (that is correct behavior as the use of both xml:lang and lang is recommended)

The same issue for for xml:id, this will output as id. (Unluckily 'Correct faulty and chopped off HTML' doesn't correct this one :).

Steps to reproduce:
1. install clean Drupal 8
3. add page with body text '<em xml:id="1234">test</em>
4. it will show as <em id="1234">test</em>

Files: 
CommentFileSizeAuthor
#3 patch_commit_faadf30b7d1c.patch783 bytesHanno
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch patch_commit_faadf30b7d1c_0.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]
#1 patch_commit_faadf30b7d1c.patch783 bytesHanno
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch patch_commit_faadf30b7d1c.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

Comments

Issue summary:View changes

minor typo

Issue summary:View changes

.

Priority:Minor» Normal
Status:Active» Needs review
Issue tags:+language of parts
StatusFileSize
new783 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch patch_commit_faadf30b7d1c.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

Did some research and it happens in _filter_xss_attributes($attr) in common.inc,
in particular in this regular expression: '/^([-a-zA-Z]+)/'.

Changing this to '/^([-a-zA-Z:]+)/' works.

Is this a safe solution? If ':' is only used with the combination 'xml:', this expression could be probably more narrowed? Patch provided but needs review.

Status:Needs review» Needs work

The last submitted patch, patch_commit_faadf30b7d1c.patch, failed testing.

Issue tags:+XHTML, +Invalid XHTML, +needs security review
StatusFileSize
new783 bytes
FAILED: [[SimpleTest]]: [MySQL] Unable to apply patch patch_commit_faadf30b7d1c_0.patch. This may be a -p0 (old style) patch, which is no longer supported by the testbots.
[ View ]

Issue summary:View changes

.

Issue summary:View changes

.

Title:attributes 'xml:lang' and 'xml:id' transform to 'lang' and 'id' in filter_xss'xml:lang' transforms to 'lang' due to 'limit allowed HTML tags'

Not exclusively related to the filter module

Title:'xml:lang' transforms to 'lang' due to 'limit allowed HTML tags'attributes 'xml:lang' and 'xml:id' transform to 'lang' and 'id' in filter_xss
Component:filter.module» base system

better title

Title:'xml:lang' transforms to 'lang' due to 'limit allowed HTML tags'attributes 'xml:lang' and 'xml:id' transform to 'lang' and 'id' in filter_xss
Status:Needs work» Needs review

Setting for testing of last patch.

Issue tags:+D8MI

Also tagging.

Does this even apply to Drupal 8? Drupal 8 now outputs HTML 5 (see http://groups.drupal.org/node/187004) and reading the spec (http://www.whatwg.org/specs/web-apps/current-work/#the-lang-and-xml:lang...), it says we should not use it in a forward looking manner.

Authors must not use the lang attribute in the XML namespace on HTML elements in HTML documents. To ease migration to and from XHTML, authors may specify an attribute in no namespace with no prefix and with the literal localname "xml:lang" on HTML elements in HTML documents, but such attributes must only be specified if a lang attribute in no namespace is also specified, and both attributes must have the same value when compared in an ASCII case-insensitive manner.

The attribute in no namespace with no prefix and with the literal localname "xml:lang" has no effect on language processing.

However, the HTML 5 patch seems to have carried over xml:lang without adding lang. (At #1077566: Convert html.tpl.php to HTML5).

Status:Needs review» Needs work

The last submitted patch, patch_commit_faadf30b7d1c.patch, failed testing.

Status:Needs work» Needs review

I've opened #1330922: Drupal 8 HTML 5 page template should use lang not xml:lang for the xml:lang -> lang work/question.

Version:8.x-dev» 7.x-dev
Issue tags:-D8MI

All right, so seems like we are not going to use XHTML(5) in Drupal 8, so this will not apply there as far as I see. It does apply to Drupal 7 and 6 though.

interesting about html5 and the lang attribute.
While it make sense to only apply this for D7, in my opinion a filter for cross site scripting shouldn't alter attributes from XHTML-style to HTML5 style. Instead, that could be done in the Correct faulty HTML filter.

And there is a bug for the Correct faulty HTML filter. It changes lang='xx' to lang='xx' xml:lang='xx' (i.e. it will print these both attributes). While this is correct XHTML for D7, from it doesn't appear correct html5 for D8 (see comment #8).

EDIT: created an issue: #1333730: PHP DOM (libxml2) only understands XHTML4, misinterprets HTML5, but D8 must cope with HTML5

Well, yeah, depending on how you look at it. Security functions should filter data for the type of input/output expected (depending on where the filtering happens), and since we handle HTML5 in Drupal 8, we should set the tight filter for that, right? We don't really know what kind of namespaced values might cause issues for clients, so we set our whitelist to the tightest possible. I think that makes sense from a security point of view. I agree on the point of #1333730: PHP DOM (libxml2) only understands XHTML4, misinterprets HTML5, but D8 must cope with HTML5 as well though and will definitely be needed for converting existing data.

Do we know if contributed modules are using filter_xss for other filtering than html5? When views or other modules handle XML-related data using this filter, it might give unexpected results in some cases.
Curious to know if the colon isn't allowed by accident. Maybe this issue can be handled together with this issue: #928948: _filter_xss_attributes should ignore attribute values for "value" and "flashvars".

Just my two cents, but I'm hitting this problem even in Drupal 6: I need to allow the "fb:profile-pic" tag, but filter_xss won't allow any tag with a colon in it's name. Seems this is a bug, as the HTML5 spec clearly allows for tags (and attribute) namespaces.

Well, good to report another case. Found also this duplicate issue #952964: String stripped from title and alt attribute if contains a colon

For now I "fixed" the problem with a (gasp) patch to _filter_xss_split() in filter.module (D6), adding colons and hyphens as allowed characters. So "fb:profile-pic" is recognized as a tag name.

Line 1059:

<?php
if (!preg_match('%^(?:<\s*(/\s*)?([a-zA-Z0-9:-]+)([^>]*)>?|(<!--.*?-->))$%', $string, $matches)) {
//                                          ^^ added here.
?>

To me, this will never get fixed. xml:XXX is not a safe attribute. See https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Preventio...

Issue summary:View changes

steps to reproduce added