Feed parsing breaks (XML_ERR_NAME_REQUIRED)

AlexisWilke - December 23, 2008 - 21:46
Project:Drupal
Version:6.x-dev
Component:aggregator.module
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active
Description

It looks like the feed parser has a problem with the following RSS file.

The errors:

Dec 23 13:17:02 halk drupal: http://animals.m2osw.com|1230067022|aggregator|<hidden IP>|http://animals.m2osw.com/cron.php||0||The feed from National Geographic News seems to be broken, due to an error "XML_ERR_NAME_REQUIRED" on line 98.
Dec 23 13:17:02 halk drupal: http://animals.m2osw.com|1230067022|aggregator|<hidden IP>|http://animals.m2osw.com/cron.php||0||The feed from National Geographic News seems to be broken, due to "200 feed not parseable".

Line 98 is a <title> tag with a lone & character...

I'm attaching the file that causes problems (renamed .txt from .rss so it can be attached here).

The file is taken from: http://news.nationalgeographic.com/index.rss

Thank you.
Alexis Wilke

AttachmentSize
index.txt12.48 KB

#1

philsward - May 19, 2009 - 19:13

When is this going to be fixed? I would "assume" this would be a fairly simple problem to overcome, but I don't know since I'm not a PHP programmer...

A similar issue has been placed at: http://drupal.org/node/275567 which provides a small workaround for this issue.

The "XML_ERR_NAME_REQUIRED" problem seems to be caused by the "&" character within the xml feed. I'm not 100% sure, but I believe it is limited to the <title></title> area of the feed...

Is there anyway to have the aggregator do a check for the & when parsing the .xml file / rss feed and put something in it's place such as &amp; or and?

The issue: http://drupal.org/node/275567 posted above has been around since July of 08 which outlines the same problem and I am surprised this is still in the queue since it affects 5, 6 and probably 7, but I suppose it is a "minor" issue with all things considered...

#2

AlexisWilke - May 13, 2009 - 17:19

philsward,

Thank you for your support. The fix in #275567: The feed from Your Site seems to be broken, because of error "XML_ERR_NAME_REQUIRED" on line 97. would work great. I will add that to my core modules. 8-)

The problem, I think, is that they consider a lone & a bug in the XML and thus in the source, not in Drupal. But I consider this a lack of support for broken feed that would otherwise work just fine. It is as if your browser was to refuse a page because there are totally broken tags. 99% of the time, your browser will manage just fine, so should Drupal with broken feeds.

Thank you.
Alexis Wilke

#3

AlexisWilke - May 13, 2009 - 17:27

There is a patch so you can easily fix your version too. 8-)

Note also that it is possible that the & appears in the body text too.

AttachmentSize
aggregator-6.x-ampersand.patch 547 bytes
 
 

Drupal is a registered trademark of Dries Buytaert.