Feed parsing breaks (XML_ERR_NAME_REQUIRED)
| Project: | Drupal |
| Version: | 6.x-dev |
| Component: | aggregator.module |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | active |
It looks like the feed parser has a problem with the following RSS file.
The errors:
Dec 23 13:17:02 halk drupal: http://animals.m2osw.com|1230067022|aggregator|<hidden IP>|http://animals.m2osw.com/cron.php||0||The feed from National Geographic News seems to be broken, due to an error "XML_ERR_NAME_REQUIRED" on line 98.
Dec 23 13:17:02 halk drupal: http://animals.m2osw.com|1230067022|aggregator|<hidden IP>|http://animals.m2osw.com/cron.php||0||The feed from National Geographic News seems to be broken, due to "200 feed not parseable".
Line 98 is a <title> tag with a lone & character...
I'm attaching the file that causes problems (renamed .txt from .rss so it can be attached here).
The file is taken from: http://news.nationalgeographic.com/index.rss
Thank you.
Alexis Wilke
| Attachment | Size |
|---|---|
| index.txt | 12.48 KB |

#1
When is this going to be fixed? I would "assume" this would be a fairly simple problem to overcome, but I don't know since I'm not a PHP programmer...
A similar issue has been placed at: http://drupal.org/node/275567 which provides a small workaround for this issue.
The "XML_ERR_NAME_REQUIRED" problem seems to be caused by the "&" character within the xml feed. I'm not 100% sure, but I believe it is limited to the
<title></title>area of the feed...Is there anyway to have the aggregator do a check for the & when parsing the .xml file / rss feed and put something in it's place such as
&orand?The issue: http://drupal.org/node/275567 posted above has been around since July of 08 which outlines the same problem and I am surprised this is still in the queue since it affects 5, 6 and probably 7, but I suppose it is a "minor" issue with all things considered...
#2
philsward,
Thank you for your support. The fix in #275567: The feed from Your Site seems to be broken, because of error "XML_ERR_NAME_REQUIRED" on line 97. would work great. I will add that to my core modules. 8-)
The problem, I think, is that they consider a lone & a bug in the XML and thus in the source, not in Drupal. But I consider this a lack of support for broken feed that would otherwise work just fine. It is as if your browser was to refuse a page because there are totally broken tags. 99% of the time, your browser will manage just fine, so should Drupal with broken feeds.
Thank you.
Alexis Wilke
#3
There is a patch so you can easily fix your version too. 8-)
Note also that it is possible that the & appears in the body text too.