When a user enters a node or comment that contains a <[CDATA[ ... ]]> block, the "<" character is escaped, even if the user has the right to enter full HTML and PHP code. This means that the <[CDATA[ header no longer fulfills its purpose, "<" characters and others in the content of the block are also escaped, and, for example, Javascript code no longer works.
That is an unnecessary pity, as it reduces the power of expression inside Drupal tremendously.
I hit upon this snag when I tried to enter automatic tables of contents to some articles (example: http://winhlp.com/node/10). I could work around it by avoiding characters that are escaped, but that is an incredibly ugly kludge.
The solution is obviouslet Drupal acknowledge the normal behavior, let it not escape <[CDATA[ ... ]]> blocks and let it not escape anything inside such blocks, just as is customary in XML and XHTML.
Comments
Comment #1
hgmichna commentedMake that "<![CDATA[". It is only a typo in the bug report (forgot the "!"). The actual defect occurs with <![CDATA[ ... ]]>.
Comment #2
grendzy commentedSee #222926: HTML Corrector filter escapes HTML comments
Comment #3
hgmichna commentedThanks for alerting me to the other issue. It is related, but not the same. They are discussing the escaping of HTML
<!-- … -->comment tags, while I am talking about<![CDATA[ … ]]>XML entities.I have added a comment to the other issue to see whether the insiders want to deal with this problem there as well. If not, we may have to reset the duplicate status here, but let's see.
Comment #4
hgmichna commentedNot a duplicate, as explained above, and mentioned, but not directly dealt with in #222926: HTML Corrector filter escapes HTML comments.
Comment #5
petasques commentedSame for me with 6.14
Comment #6
sunComment #7
hgmichna commentedI should have mentioned: Not to escape anything inside a <![CDATA[ … ]]> element is important only as long as Drupal attempts to create XHTML pages (which it cannot and, I think, should not try to do). If it ever switches over to creating HTML, then it would be important not to escape anything inside script and style elements.
That would also solve the problem today, even in XHTML (but would not prevent invalid XHTML code).
Comment #8
damien tournoud commented7.x seems to be affected. I couldn't manage to tell
DOMDocument::loadHTML()to accept CDATA sections properly.Comment #9
hgmichna commentedI'm actually unsure about what the ultimate solution could be. <![CDATA[ … ]]> is an XML entity, not a HTML element. On the other hand, any Drupal installation would probably explode if its server served it as MIME type application/xml+xhtml, last, but not least, because a Drupal page can contain code entered by end users, which is not guaranteed to be well-formed XML.
Even as far as HTML goes, Drupal produces terrible code, full of errors, totally invalid. Try to run this page through a validator and you will see.
Since the browser can only take it as HTML, not XHTML, CDATA entities lose their meaning.
Still it would help to be able to use them to avoid escaping, but any other means, like HTML comments, might serve the same purpose.
I guess the ultimate solution would be to change Drupal to produce valid HTML 4.01 (or, perhaps even better, a mixture of 4.01 and as much as is already known about HTML 5) and make sure that HTML comments and the contents of style and script elements are not escaped.
I doubt that we will get such a perfect solution any time soon though. Somehow any big system like Drupal always gets stuck in some mud.
Comment #10
damien tournoud commented@hgmichna: Drupal 7 is pretty close to produce 100% valid XHTML. Of course, several stuff might explode if served with a XHTML content type (most notably jQuery cookie). Of course, themes can alter the markup and that can result in invalid (X)HTML (as seen on this website, for example).
Comment #11
hgmichna commentedUnfortunately with XML, pretty close does not suffice. An XML parser refuses to accept something that is pretty close.
Add to that that Drupal has to accept user entries that will not rarely break XML rules. Does Drupal even attempt to repair such entries?
Therefore I think it is unwise to aim for XML+XHTML. I think it would be much more sensible to declare HTML 4.01 or HTML 5.
For this topic this raises the question whether we should even strive to utilize <![CDATA[ … ]]> at all or just use the already fixed HTML comment. In other words, I now question my own proposal.
Or widen it. A widened proposal would be to exempt not only <![CDATA[ … ]]>, but also <style> … </style> and <script> … </script> elements from escaping. This certainly cannot hurt, and it would make life easier for users who enter full HTML code including such tags.
Comment #12
damien tournoud commentedDrupal 6 uses
XHTML 1.0 Transitional, Drupal 7 usesXHTML+RDFa 1.0. The HTML corrector is there to ensure that user-submitted content is somewhat properly formatted (of course, that's not an exact science). We don't aim as serving XML+XHTML content, because this still causes several issues. HTML5 doesn't exist yet, but could be considered for Drupal 8.Comment #13
markus_petrux commentedsubscribe
Comment #14
ohnobinki commented+1
Comment #15
damien tournoud commentedI guess that's a won't fix.
Comment #16
alexanderpas commentedThis should be revisited in D8 (or even D9) as this bug simply makes Drupal less usefull for use inside sites that want to serve actual correct xhtml.
besides, what's wrong with serving polyglot documents as text/html that are also valid when interpreted as application/xhtml+xml, (see also: http://www.w3.org/TR/html-polyglot/ )
remember, HTML5 also brings XHTML5
Comment #17
drupalshrek commentedI hit this same problem when trying to use the Alexa widgets (http://www.alexa.com/siteowners/widgets) in blocks.
The first 2 widgets are simple and work, but the 3rd one contains CDATA, e.g.:
I nearly went crazy trying to figure out why it wouldn't work. Alexa says "If it looks OK simply copy the code below and paste it into your site where you would like it to appear."
I guess the reason it won't appear is the CDATA issue with Drupal...
Comment #18
EvanDonovan commentedin re: #17: Create an input format that does not have the HTML Corrector filter and it should work. HTML Purifier is a good alternative if you need XSS protection in that format.
Comment #19
ohnobinki commentedtag
Comment #20
wim leersAgreed with #15.
#17: what #18 said.