When a user enters a node or comment that contains a <[CDATA[ ... ]]> block, the "<" character is escaped, even if the user has the right to enter full HTML and PHP code. This means that the <[CDATA[ header no longer fulfills its purpose, "<" characters and others in the content of the block are also escaped, and, for example, Javascript code no longer works.

That is an unnecessary pity, as it reduces the power of expression inside Drupal tremendously.

I hit upon this snag when I tried to enter automatic tables of contents to some articles (example: http://winhlp.com/node/10). I could work around it by avoiding characters that are escaped, but that is an incredibly ugly kludge.

The solution is obvious—let Drupal acknowledge the normal behavior, let it not escape <[CDATA[ ... ]]> blocks and let it not escape anything inside such blocks, just as is customary in XML and XHTML.

Comments

hgmichna’s picture

Title: <[CDATA[ escaped » <![CDATA[ escaped

Make that "<![CDATA[". It is only a typo in the bug report (forgot the "!"). The actual defect occurs with <![CDATA[ ... ]]>.

grendzy’s picture

Status: Active » Closed (duplicate)
hgmichna’s picture

Thanks for alerting me to the other issue. It is related, but not the same. They are discussing the escaping of HTML <!-- … --> comment tags, while I am talking about <![CDATA[ … ]]> XML entities.

I have added a comment to the other issue to see whether the insiders want to deal with this problem there as well. If not, we may have to reset the duplicate status here, but let's see.

hgmichna’s picture

Status: Closed (duplicate) » Active

Not a duplicate, as explained above, and mentioned, but not directly dealt with in #222926: HTML Corrector filter escapes HTML comments.

petasques’s picture

Same for me with 6.14

sun’s picture

Component: node system » filter.module
hgmichna’s picture

I should have mentioned: Not to escape anything inside a <![CDATA[ … ]]> element is important only as long as Drupal attempts to create XHTML pages (which it cannot and, I think, should not try to do). If it ever switches over to creating HTML, then it would be important not to escape anything inside script and style elements.

That would also solve the problem today, even in XHTML (but would not prevent invalid XHTML code).

damien tournoud’s picture

Version: 6.13 » 7.x-dev

7.x seems to be affected. I couldn't manage to tell DOMDocument::loadHTML() to accept CDATA sections properly.

hgmichna’s picture

I'm actually unsure about what the ultimate solution could be. <![CDATA[ … ]]> is an XML entity, not a HTML element. On the other hand, any Drupal installation would probably explode if its server served it as MIME type application/xml+xhtml, last, but not least, because a Drupal page can contain code entered by end users, which is not guaranteed to be well-formed XML.

Even as far as HTML goes, Drupal produces terrible code, full of errors, totally invalid. Try to run this page through a validator and you will see.

Since the browser can only take it as HTML, not XHTML, CDATA entities lose their meaning.

Still it would help to be able to use them to avoid escaping, but any other means, like HTML comments, might serve the same purpose.

I guess the ultimate solution would be to change Drupal to produce valid HTML 4.01 (or, perhaps even better, a mixture of 4.01 and as much as is already known about HTML 5) and make sure that HTML comments and the contents of style and script elements are not escaped.

I doubt that we will get such a perfect solution any time soon though. Somehow any big system like Drupal always gets stuck in some mud.

damien tournoud’s picture

@hgmichna: Drupal 7 is pretty close to produce 100% valid XHTML. Of course, several stuff might explode if served with a XHTML content type (most notably jQuery cookie). Of course, themes can alter the markup and that can result in invalid (X)HTML (as seen on this website, for example).

hgmichna’s picture

Unfortunately with XML, pretty close does not suffice. An XML parser refuses to accept something that is pretty close.

Add to that that Drupal has to accept user entries that will not rarely break XML rules. Does Drupal even attempt to repair such entries?

Therefore I think it is unwise to aim for XML+XHTML. I think it would be much more sensible to declare HTML 4.01 or HTML 5.

For this topic this raises the question whether we should even strive to utilize <![CDATA[ … ]]> at all or just use the already fixed HTML comment. In other words, I now question my own proposal.

Or widen it. A widened proposal would be to exempt not only <![CDATA[ … ]]>, but also <style> … </style> and <script> … </script> elements from escaping. This certainly cannot hurt, and it would make life easier for users who enter full HTML code including such tags.

damien tournoud’s picture

Drupal 6 uses XHTML 1.0 Transitional, Drupal 7 uses XHTML+RDFa 1.0. The HTML corrector is there to ensure that user-submitted content is somewhat properly formatted (of course, that's not an exact science). We don't aim as serving XML+XHTML content, because this still causes several issues. HTML5 doesn't exist yet, but could be considered for Drupal 8.

markus_petrux’s picture

subscribe

ohnobinki’s picture

+1

damien tournoud’s picture

Status: Active » Closed (won't fix)

I guess that's a won't fix.

alexanderpas’s picture

Version: 7.x-dev » 8.x-dev
Priority: Minor » Normal
Status: Closed (won't fix) » Active
Issue tags: +html5

This should be revisited in D8 (or even D9) as this bug simply makes Drupal less usefull for use inside sites that want to serve actual correct xhtml.

besides, what's wrong with serving polyglot documents as text/html that are also valid when interpreted as application/xhtml+xml, (see also: http://www.w3.org/TR/html-polyglot/ )

remember, HTML5 also brings XHTML5

drupalshrek’s picture

I hit this same problem when trying to use the Alexa widgets (http://www.alexa.com/siteowners/widgets) in blocks.

The first 2 widgets are simple and work, but the 3rd one contains CDATA, e.g.:

<!-- Alexa Graph Widget from http://www.alexa.com/siteowners/widgets/graph -->
<script type="text/javascript"
src="http://widgets.alexa.com/traffic/javascript/graph.js"></script>
<script type="text/javascript">/*
<![CDATA[*/
// USER-EDITABLE VARIABLES
// enter up to 3 domains, separated by a space
var sites      = ['drupal.org'];
var opts = {
width:      400,  // width in pixels (max 400)
height:     220,  // height in pixels (max 300)
type:       'r',  // "r" Reach, "n" Rank, "p" Page Views
range:      '3m', // "7d", "1m", "3m", "6m", "1y", "3y", "5y", "max"
bgcolor:    'e6f3fc' // hex value without "#" char (usually "e6f3fc")
};
// END USER-EDITABLE VARIABLES
AGraphManager.add( new AGraph(sites, opts) );
//]]></script>
<!-- end Alexa Graph Widget -->

I nearly went crazy trying to figure out why it wouldn't work. Alexa says "If it looks OK simply copy the code below and paste it into your site where you would like it to appear."

I guess the reason it won't appear is the CDATA issue with Drupal...

EvanDonovan’s picture

in re: #17: Create an input format that does not have the HTML Corrector filter and it should work. HTML Purifier is a good alternative if you need XSS protection in that format.

ohnobinki’s picture

Issue tags: +xhtml compliance

tag

wim leers’s picture

Issue summary: View changes
Status: Active » Closed (won't fix)

Agreed with #15.

#17: what #18 said.