Hi All,

I am attempting to use the core Aggregator module to pull in a news feed from Google News. I have created the feed and it pulls items no problem, but it makes no attempt to interpret the HTML in the feed. Rather, it just prints it out verbatim so I get a page full of ugly gibberish.

Something like this:

font style="font-size:85%;font-family:arial,sans-serif"brdiv style="padding-top:0.8em;"img alt="" height="1" width="1"/divdiv class=lhtable border=0 align=right cellspacing=0 cellpadding=0cellpadding=3 style="font-size:100%;font-family:arial,sans-serif"trtd width=80 align=center style="padding-left:6px;" valign=topa href="http://news.google.com/news/url?sa=Tct=us/1-0i-0fd=Rurl=http://seattlepi.nwsource.com/national/1104ap_as_kashmir_elections.html%3Fsource%3Dmypicid=1280449413ei=C3xFSejQDpfcMZHOpYcPusg=AFQjCNE0eD4RX0pd614zD4V6jHEyH2XbHA"img src=http://news.google.com/news?imgefp=Dpb4Ilnio_8Jimgurl=seattlepi.nwsource.com/dayart/aponline/9839.101India-Britain.sff.jpg width=77 height=80 alt="" border=1brfont size=-2Seattle Post Intelligencer/font/a/td/tr/tablea href="http://news.google.com/news/url?sa=Tct=us/1-0-0fd=Rurl=http://www.hindu.com/2008/12/15/stories/2008121558931200.htmcid=1280449413ei=C3xFSejQDpfcMZHOpYcPusg=AFQjCNHKUs2BqBMe93k3YmIDjEYRc2un4A"b“Pakistan must ensure its soil is not used for terrorist activities”/b/abrfont size=-1bfont color=#6f6f6fHindunbsp;-/font nobr22 minutes ago/nobr/b/fontbrfont size=-1Paramilitary personnel stand guard in Srinagar on Sunday. Curfew was clamped in the wake of Prime Minister Manmohan Singh’s visit to Kashmir./fontbrfont size=-1a href="http://news.google.com/news/url?sa=Tct=us/1-0-1fd=Rurl=http://www.upi.com/Top_News/2008/12/14/Brown_urges_Zardari_to_break_terror_links/UPI-24221229276820

Does anyone know how to stop this and get readable text?

Thanks

Jon

Comments

Road Runner’s picture

I am having trouble with Google newsfeed too. Gibberish and when I click on headline I get a page redirect that doesn't work

I quick messing around for a while and use Feedburner now.

jonwatson’s picture

HI

I tried feeding this Google News feed through Feedburner as well, but with the same result. Is that what you mean when you say you use Feedburner now?

Thanks

Road Runner’s picture

I went to feedburner.com and got their feeds instead of directly from google. Go to my site http://www.digitalmania-online.com and check the news at bottom of home page. They are all from feedburner.

jonwatson’s picture

Turns out this isn't just a Google News item. I've tried feeds from various sites and they all show up like gibberish when I used the categories taxonomy to look at them.

I can't be the only one experiencing this. Anyone?

Arktronic’s picture

This is actually a libxml2 bug: http://bugs.php.net/bug.php?id=45996