XML output completely broken and nodes rendered incorrectly

stefanor - September 16, 2009 - 16:46
Project:Atom
Version:6.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Dave Reid
Status:active
Description

On my personal drupal-powerd site, tumbleweed.org.za, the atom module produces XML that contains HTML entities. This works in many clients, but fails validation on feedparser.org and firefox.
These entities are probably the result of the Markdown module, but Atom should always produce valid output.

Patch against HEAD attached.

AttachmentSize
atom-xml-entities.patch9.08 KB

#1

stefanor - September 16, 2009 - 16:49

Eek, typo in the function name in that patch. V2 attached.

AttachmentSize
atom-xml-entities2.patch 9.08 KB

#2

deekayen - September 16, 2009 - 16:53

and why is this the chosen method instead of something like htmlentities()

#3

stefanor - September 16, 2009 - 16:57

Some examples of the problems attached.

AttachmentSize
good-atom.xml_.gz 29.36 KB
bad-atom.xml_.gz 29.37 KB

#4

stefanor - September 16, 2009 - 17:02

deekayen:

htmlentities() doesn't really solve this problem. De-entitfying the HTML would break it. Eg "<pre>&lt;</pre>" would become "<pre><</pre>" which is now ambiguous - we can't turn this into XML, as we don't know which entities should be escaped.

There is a built in table in PHP, get_html_translation_table, however it's incomplete. It doesn't have "mdash" for example.

#5

Dave Reid - September 21, 2009 - 01:49
Status:needs review» needs work

After digging into this a little more the problem is that either $node->teaser or $node->body is run through check_markup by the node system via node_invoke('view', $node) or node_prepare($node). We're running check_markup on it again. The node_feed() only allows either the teaser or full content to be included in an RSS feed, and it's run only through check_plain(). If we can do that (only run each item through check_markup and check_plain once), it will work fine. The more I look into this code, the more it needs a major reorganization.

#6

Dave Reid - September 21, 2009 - 01:56
Title:Atom output includes HTML entities in body» XML output completely broken and nodes rendered incorrectly
Version:6.x-1.1» 6.x-1.x-dev
Assigned to:Anonymous» Dave Reid

Changing title to absorb a few other duplicate issues:
#368819: Atom module and CCK fields
#542122: aton xml syntax error no data

Working on a patch.

#7

Dave Reid - September 21, 2009 - 19:36

Ugh...they way that Drupal renders node content I'm not finding a way to get both a correctly formatted teaser and body. It's basically either one, but not both. I'm leaning towards making this more like the built-in RSS settings in which you can only select teasers or full text, but not both. This would make things a whole lot easier.

#8

Dave Reid - September 21, 2009 - 21:40
Status:needs work» fixed

This seems to be fixed now with the latest commits. It's semi-fixed in the Drupal 7 (HEAD) branch but I'm still trying to adjust for the new ways that node content are built. Tentatively marking as fixed.

#9

Dave Reid - September 21, 2009 - 22:21
Status:fixed» active

Yeah, nevermind. The problem still persists.

 
 

Drupal is a registered trademark of Dries Buytaert.