XML output completely broken and nodes rendered incorrectly [#579286]

htmlentities() doesn't really solve this problem. De-entitfying the HTML would break it. Eg "<pre><</pre>" would become "<pre><</pre>" which is now ambiguous - we can't turn this into XML, as we don't know which entities should be escaped.

There is a built in table in PHP, get_html_translation_table, however it's incomplete. It doesn't have "mdash" for example.

Log in or register to post comments

Comment #5

dave reid

he/him

English

Nebraska USA

commented 21 September 2009 at 01:49

Status:

Needs review

» Needs work

After digging into this a little more the problem is that either $node->teaser or $node->body is run through check_markup by the node system via node_invoke('view', $node) or node_prepare($node). We're running check_markup on it again. The node_feed() only allows either the teaser or full content to be included in an RSS feed, and it's run only through check_plain(). If we can do that (only run each item through check_markup and check_plain once), it will work fine. The more I look into this code, the more it needs a major reorganization.

Log in or register to post comments

Comment #6

dave reid

he/him

English

Nebraska USA

commented 21 September 2009 at 01:56

Title:	Atom output includes HTML entities in body	» XML output completely broken and nodes rendered incorrectly
Version:	6.x-1.1	» 6.x-1.x-dev
Assigned:	Unassigned	» dave reid

Changing title to absorb a few other duplicate issues:
#368819: Atom module and CCK fields
#542122: aton xml syntax error no data

Working on a patch.

Log in or register to post comments

Comment #7

dave reid

he/him

English

Nebraska USA

commented 21 September 2009 at 19:36

Ugh...they way that Drupal renders node content I'm not finding a way to get both a correctly formatted teaser and body. It's basically either one, but not both. I'm leaning towards making this more like the built-in RSS settings in which you can only select teasers or full text, but not both. This would make things a whole lot easier.

Log in or register to post comments

Comment #8

dave reid

he/him

English

Nebraska USA

commented 21 September 2009 at 21:40

Status:

Needs work

» Fixed

This seems to be fixed now with the latest commits. It's semi-fixed in the Drupal 7 (HEAD) branch but I'm still trying to adjust for the new ways that node content are built. Tentatively marking as fixed.

Log in or register to post comments

Comment #9

dave reid

he/him

English

Nebraska USA

commented 21 September 2009 at 22:21

Status:

Fixed

» Active

Yeah, nevermind. The problem still persists.

Log in or register to post comments

Comment #10

nschloe commented 20 July 2010 at 12:17

The above patch does not apply for me.

$ patch -p0 < atom-xml-entities2.patch 
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: atom.module
|===================================================================
|RCS file: /cvs/drupal-contrib/contributions/modules/atom/atom.module,v
|retrieving revision 1.35
|diff -u -p -r1.35 atom.module
|--- atom.module        8 May 2009 20:01:23 -0000       1.35
|+++ atom.module        16 Sep 2009 16:47:51 -0000
--------------------------
Patching file atom.module using Plan A...
Hunk #1 succeeded at 108 (offset -5 lines).
Hunk #2 failed at 496.
Hunk #3 failed at 504.
2 out of 3 hunks failed--saving rejects to atom.module.rej
done

This is with Atom-6.x-1.1.

Log in or register to post comments

Comment #11

nschloe commented 29 July 2010 at 14:02

Hi,

just manually copied and pasted the changes into the atom.module, and it appears to do what it's supposed to.
As a result, the XML markup produced by the module contains a lot less invalid symbols. Nice!

The function _atom_html_to_xml_entities() should also be applied to title, subtitle, and so forth.

Also, the "@" symbol appears not to be legal XML markup; I guess that one should be replaced as well.

Cheers,
Nico

Log in or register to post comments

Comment #12

nschloe commented 29 July 2010 at 14:13

As for the "@" sign, I'm not so sure now. I just read that it should actually be legal XML, but the W3C feed validator tell me that

XML parsing error: <unknown>:126:50: not well-formed (invalid token)

in the line

<p>For bug reports or suggestions, contact <foo@bar.org> or<br />

and points to the "@".

Log in or register to post comments

Comment #13

dman commented 29 July 2010 at 14:22

A validation error on <foo@bar.org> is an error with that "tag". @ is legal freetext, but not legal as an XML tag name - which is what it looks like. Plus the . plus the tag is unclosed.

Log in or register to post comments

Comment #14

nschloe commented 29 July 2010 at 14:49

So the "<" and ">" characters should actually be converted as well? I guess it may get a little tricky here..

Log in or register to post comments

Comment #15

dman commented 29 July 2010 at 14:54

Well, that input is crappy HTML, XHTML, XML or RSS whichever way you look at it. So there's really no reason for it to ever exist. Don't do that, it's bad.
It won't render on the screen, and it won't render in anyones reader, never mind that it doesn't parse.
GIGO

Log in or register to post comments

Comment #16

nschloe commented 29 July 2010 at 15:44

Oh, that's right! Thanks for the heads-up.

Apart from that, applying _atom_html_to_xml_entities() to <title>, <subtitle>, and so on still remains worthwhile I believe.

Cheers,
Nico

Log in or register to post comments

XML output completely broken and nodes rendered incorrectly

Comments

Comment #1

Comment #2

Comment #3

Comment #4

Comment #5

Comment #6

Comment #7

Comment #8

Comment #9

Comment #10

Comment #11

Comment #12

Comment #13

Comment #14

Comment #15

Comment #16

News items

Our community

Documentation

Drupal code base

Governance of community