Feed aggregator strips characters from rss feeds, breaking summaries and links
| Project: | Drupal |
| Version: | 6.14 |
| Component: | base system |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | active |
Jump to:
When parsing RSS XML, the feed aggregator is stripping out HTML character entities, causing broken summaries and links. For example, I've started having problems with Google News, using the following RSS feed: http://news.google.com/news?pz=1&ned=us&hl=en&output=rss
The link URL that Google News sends me uses the "&" HTML character entity to separate a number of query parameters. Instead of storing the separator, the feed aggregator strips it out, breaking the URL.
Here is an excerpt of an item Google News sends me when I manually issue the above HTTP request:
http://www.google.com/news/url?sa=T&ct=us/0-1-0&fd=R&url=http://www.bloomberg.com/apps/news%3Fpid%3D20601087%26sid%3DaiGdyvcQTLLA%26refer%3Dhome&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNHj-3q-T54lxYDsGKz59Z1SYNwWeA
Meanwhile. the database value aggregator_item.link shows:
http://news.google.com/news/url?sa=Tct=us/0-1-0fd=Rurl=http://www.bloomberg.com/apps/news%3Fpid%3D20601087%26sid%3DaiGdyvcQTLLA%26refer%3Dhomecid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNHnvnZsA6MZcAfcjhbnJsK8jgEg3A
Article summaries are unreadable as well, because the character entities <, >, and & are all being stripped out, exposing raw tags and attributes to the user as if they were actual content. Here is that same article's summary, first fetched manually, then as extracted from aggregator_item.description.
Manual fetch:
<font style="font-size:85%;font-family:arial,sans-serif"><br><div style="padding-top:0.8em;"><img alt="" height="1" width="1"></div><div class=lh><table border=0 align=right cellspacing=0 cellpadding=0cellpadding=3 style="font-size:100%;font-family:arial,sans-serif"><tr><td width=80 align=center style="padding-left:6px;" valign=top><a href="http://www.google.com/news/url?sa=T&ct=us/0-1i-0&fd=R&url=http://www.washingtonpost.com/wp-dyn/content/article/2009/05/27/AR2009052700229.html&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNE9LqRyNbPJvaqoyIXmlAdXf0MppA"><img src=http://nt1.ggpht.com/news?imgefp=3eZ5RfaBt_oJ&imgurl=media3.washingtonpost.com/wp-dyn/content/photo/2009/05/27/PH2009052700231.jpg width=77 height=80 alt="" border=1><br><font size=-2>Washington Post</font></a></td></tr></table><a href="http://www.google.com/news/url?sa=T&ct=us/0-1-0&fd=R&url=http://www.bloomberg.com/apps/news%3Fpid%3D20601087%26sid%3DaiGdyvcQTLLA%26refer%3Dhome&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNHj-3q-T54lxYDsGKz59Z1SYNwWeA"><b>Obama N. Korea Options May Be Limited by Regime Shift</b></a><br><font size=-1><b><font color=#6f6f6f>Bloomberg</font></b></font><br><font size=-1>By Indira AR Lakshmanan and Heejin Koo May 27 (Bloomberg) -- The Obama administration's ability to pressure North Korea's insular leadership to abandon nuclear weapons may be hamstrung by internal jockeying and unease in the communist state over who <b>...</b></font><br><font size=-1><a href="http://www.google.com/news/url?sa=T&ct=us/0-1-1&fd=R&url=http://www.youtube.com/watch%3Fv%3D95v9d3w4tzM&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNGpRYBBToDRC7HOJpkjh0W4CV9b_g">Video: Reaction: Will North Korea's nukes lead to war?</a> <font size=-1 color=#6f6f6f><nobr>UPI</nobr></font><object width="448" height="356"><param name="movie" value="http://www.youtube.com/v/95v9d3w4tzM"></param><param name="wmode" value="transparent"></param><embed src="http://www.youtube.com/v/95v9d3w4tzM"type="application/x-shockwave-flash"wmode="transparent"width="448"height="356"></embed></object><br></font><font size=-1><a href="http://www.google.com/news/url?sa=T&ct=us/0-1-2&fd=R&url=http://english.aljazeera.net/news/asia-pacific/2009/05/2009527195524608822.html&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNEcKrutKflbrDY5vKsuWnTrm404Nw">N Korea warned of 'consequences'</a> <font size=-1 color=#6f6f6f><nobr>Aljazeera.net</nobr></font></font><br><font size=-1 class=p><a href="http://www.google.com/news/url?sa=T&ct=us/0-1-3&fd=R&url=http://www.washingtonpost.com/wp-dyn/content/article/2009/05/27/AR2009052702353.html&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNETTwK5v1ZU5ca8k9wTk7P7RMkMCg"><nobr>Washington Post</nobr></a> - <a href="http://www.google.com/news/url?sa=T&ct=us/0-1-4&fd=R&url=http://www.reuters.com/article/topNews/idUSTRE54Q5R620090527&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNFM8UcQSuznoxM9BowXrJyFuPDgXA"><nobr>Reuters</nobr></a> - <a href="http://www.google.com/news/url?sa=T&ct=us/0-1-5&fd=R&url=http://www.upi.com/Top_News/2009/05/27/Clinton-N-Korea-must-face-consequences/UPI-36601243462350/&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNGHdmvRqQSTBEXH-dvEoX5LaJV_fA"><nobr>United Press International</nobr></a> - <a href="http://www.google.com/news/url?sa=T&ct=us/0-1-6&fd=R&url=http://www.google.com/hostednews/afp/article/ALeqM5iwb9ioSkAJOAJKRCgQfv4s9aWMfw&cid=1246822686&ei=mMAdSrXbGtKdlQf65LCJBw&usg=AFQjCNHQWg5AsGKcuXGG7moNPErpZcZM3A"><nobr>AFP</nobr></a></font><br/><font class=p size=-1><a class=p href=http://www.google.com/news?pz=1&ned=us&hl=en&ncl=dj3-cRGcAtrLCiMjO4SV67_oe2RqM><nobr><b>all 12,560 news articles</b></nobr></a></font><br clear=all> </div></font>
And from aggregator_item.description:
font style=font-size:85%;font-family:arial,sans-serifbrdiv style=padding-top:0.8em;img alt= height=1 width=1/divdiv class=lhtable border=0 align=right cellspacing=0 cellpadding=0cellpadding=3 style=font-size:100%;font-family:arial,sans-seriftrtd width=80 align=center style=padding-left:6px; valign=topa href=http://news.google.com/news/url?sa=Tct=us/0-1i-0fd=Rurl=http://www.washingtonpost.com/wp-dyn/content/article/2009/05/27/AR2009052700229.htmlcid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNG0rv_TekZcwrpLxlrbLAy2KdFltgimg src=http://nt1.ggpht.com/news?imgefp=3eZ5RfaBt_oJimgurl=media3.washingtonpost.com/wp-dyn/content/photo/2009/05/27/PH2009052700231.jpg width=77 height=80 alt= border=1brfont size=-2Washington Post/font/a/td/tr/tablea href=http://news.google.com/news/url?sa=Tct=us/0-1-0fd=Rurl=http://www.bloomberg.com/apps/news%3Fpid%3D20601087%26sid%3DaiGdyvcQTLLA%26refer%3Dhomecid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNHnvnZsA6MZcAfcjhbnJsK8jgEg3AbObama N. Korea Options May Be Limited by Regime Shift/b/abrfont size=-1bfont color=#6f6f6fBloomberg/font/b/fontbrfont size=-1By Indira AR Lakshmanan and Heejin Koo May 27 (Bloomberg) -- The Obama administration#39;s ability to pressure North Korea#39;s insular leadership to abandon nuclear weapons may be hamstrung by internal jockeying and unease in the communist state over who b.../b/fontbrfont size=-1a href=http://news.google.com/news/url?sa=Tct=us/0-1-1fd=Rurl=http://www.youtube.com/watch%3Fv%3D95v9d3w4tzMcid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNHumhczb7vl0CI71D4t0tARoKU5ngVideo: Reaction: Will North Korea#39;s nukes lead to war?/a font size=-1 color=#6f6f6fnobrUPI/nobr/fontobject width=448 height=356param name=movie value=http://www.youtube.com/v/95v9d3w4tzM/paramparam name=wmode value=transparent/paramembed src=http://www.youtube.com/v/95v9d3w4tzMtype=application/x-shockwave-flashwmode=transparentwidth=448height=356/embed/objectbr/fontfont size=-1a href=http://news.google.com/news/url?sa=Tct=us/0-1-2fd=Rurl=http://english.aljazeera.net/news/asia-pacific/2009/05/2009527195524608822.htmlcid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNE5hdYoN7qNEdBjjj6OuolDDmnoigN Korea warned of #39;consequences#39;/a font size=-1 color=#6f6f6fnobrAljazeera.net/nobr/font/fontbrfont size=-1 class=pa href=http://news.google.com/news/url?sa=Tct=us/0-1-3fd=Rurl=http://www.washingtonpost.com/wp-dyn/content/article/2009/05/27/AR2009052702353.htmlcid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNHU6JKN2yIBgXhck1oUbz22rXCi8AnobrWashington Post/nobr/anbsp;- a href=http://news.google.com/news/url?sa=Tct=us/0-1-4fd=Rurl=http://www.reuters.com/article/topNews/idUSTRE54Q5R620090527cid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNE_a9umAbB5q8utbbiJKgw5DspyewnobrReuters/nobr/anbsp;- a href=http://news.google.com/news/url?sa=Tct=us/0-1-5fd=Rurl=http://www.upi.com/Top_News/2009/05/27/Clinton-N-Korea-must-face-consequences/UPI-36601243462350/cid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNFcPFh6qwvi48m7D3Gi1iJ_iN8CMAnobrUnited Press International/nobr/anbsp;- a href=http://news.google.com/news/url?sa=Tct=us/0-1-6fd=Rurl=http://www.google.com/hostednews/afp/article/ALeqM5iwb9ioSkAJOAJKRCgQfv4s9aWMfwcid=1246822686ei=zMAdSvWdFtqdlQfG8J2OBwusg=AFQjCNH1zK4oeEQT7yxox24fZGwrIqMIywnobrAFP/nobr/a/fontbr/font class=p size=-1a class=p href=http://news.google.com/news?pz=1ned=ushl=enncl=dj3-cRGcAtrLCiMjO4SV67_oe2RqMnobrball 12,560 news articles/b/nobr/a/fontbr clear=all /div/font
Any chance I've got something misconfigured? I can't imagine what though...

#1
Sorry, the character entities in this post are being rendered as the actual characters...
Please "view source" of the blockquotes to see what the feed aggregator is picking up.
Thanks for any help!
#2
#3
HELP THIS IS HAPPENING TO ME
(update)
http://www.twincityscene.com/aggregator/
#4
also experiencing this issue.
anyone have a solution?
#5
Same problem! There must be a workaround for this. The Drupal.org aggregator has a google news feed that works just fine....
#6
This issue seems to have died down but I am still having this problem. Anyone have the fix for this problem?
#7
we still have this problem as well.
#8
are we the only ones experiencing this?
is it a bug? bad feed? misconfiguration somewhere in our setup?
#9
Still happening in 6.14...
#10
#11
Subscribe! The feed is looking good in Google RSS reader but not in Drupal