Use CDATA in XML Feeds (was: Malformed XML Feeds)
CodeMonkeyX - October 2, 2003 - 19:36
| Project: | Drupal |
| Version: | 7.x-dev |
| Component: | base system |
| Category: | feature request |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
Description
Description:
The feed has html elements in the description field for each node. This is not allowed, all blocks of html codes should be contained within a CDATA tag.
You can read more at this site, http://webservices.xml.com/pub/a/ws/2002/11/19/rssfeedquality.html.

#1
In Drupal 4.1.0 the workaround for this issue is editing /usr/share/drupal/include/common.inc and remove all html_entities() in format_rss_item() and format_rss_channel. You can't remove algo strip_tags() because the XML generated won't be valid. I don't know if this issue has been already fixed in Drupal 4.3.0 as I'm waiting for the Debian maintainer to upload the new package. I'll try then.
#2
The problem is not malformed XML as far as I can see: all entitities relevant to XML are escaped (as < > ...). HTML entities are doubley escaped (< into &lt;). Drupal outputs correct XML.
It's a question of RSS quality and the general recommendation for using CDATA though: it offers advantages in terms of filesize.
#3
Steven says this isn't a bug. Changing title.
#4
If this problem is what I think it is then you might want to look at the fixentities filter module I made. It replaces un-entity-coded less than signs and ampersands with the proper codes if they are not part of tags or entity codes. This fixes some validity issues with invalid user input in both the feed and xhtml view.
#5
CDATA helps us to save a space and get document more readable. But I see here one pitfall. What if our feed contains substring ]]> in it? So feeb become broken?
#6
Is this a feature request?
Still applies to current version?