Hello, I help manage a site which pulls in a lot of RSS feeds (http://owli.org/). It's been brought to my attention that some of the feeds aren't updating. At /admin/content/aggregator, when I click "update items" on an overdue feed, sometimes the error is very straightforward, such as: The feed from Canada seems to be broken, because of error "404 Not Found". When I check the link, it is indeed an obsolete page.

However, sometimes I get errors that I don't know how to deal with:

"The feed from IDRC-Africa seems to be broken, because of error "not well-formed (invalid token)" on line 8."

"The feed from OWL South Korea seems to be broken, because of error "junk after document element" on line 10."
"The feed from OWL Ghana seems to be broken, because of error "junk after document element" on line 23."
"The feed from OWL Malta seems to be broken, because of error "junk after document element" on line 24."
"The feed from OWL Indonesia seems to be broken, because of error "junk after document element" on line 25."
"The feed from Owli Thailand News seems to be broken, because of error "junk after document element" on line 55."
"The feed from OWL Australia seems to be broken, because of error "junk after document element" on line 89."
"The feed from OWL India seems to be broken, because of error "junk after document element" on line 152."

"The feed from OWL U.S. seems to be broken, because of error "mismatched tag" on line 109."

I included the same error messages with different line numbers as I thought it may be helpful. Considering that many have the same error/line number, the above shows all the errors that we're encountering.

Thanks for any assistance you can provide!

Jim

Comments

ainigma32’s picture

Status: Active » Postponed (maintainer needs more info)

What do you see if you access the links of the feeds that are failing through a web browser?

- Arie

jimmb’s picture

Hi Arie,

Thanks for the quick response! Here are specific examples of the 3 main problems...

On the "IDRC-Africa" feed, the link is:
feed://www.idrc.ca/ev_en.php?ID=8554_201&ID2=DO_RSS
This looks like a normal page to me

On the "OWL India" feed, the link is:
feed://www.owli.org/taxonomy/term/32/0/feed
Also looks normal, far as I can tell

Lastly, the "OWL U.S." feed link is:
http://www.owli.org/taxonomy/term/48/feed
Seems normal as well.

I'll be happy to give you a temporary admin account if you want to log into the site....

Best,

Jim

ainigma32’s picture

I don't think admin access will be necessary but thanks for the vote of confidence :-)

When you look at http://www.owli.org/taxonomy/term/48/feed you see a normal web page. If you look at http://www.owli.org/taxonomy/term/48/feed/feed you will see the format that the aggregator understands.

Unless I'm mistaken the problem for this particular feed will be resolved if you set the feed url to the second one mentioned above.

The other two feeds look OK (in FF anyway) so those problems might have been temporary.

What happens if you refresh those two manually?

- Arie

jimmb’s picture

Hi Arie,

Thanks for pointing out the U.S. feed was incorrect (I should have noticed that). I updated the feed URL, and then clicked "update items" on /admin/content/aggregator. That returned this message:

The feed from OWL U.S. seems to be broken, because of error " invalid schema feed".

For the other two examples, I'm not sure what you mean by refreshing manually. Is that the same as "update items", clearing the browser's cache and refreshing, or something else?

Regards,

Jim

ainigma32’s picture

Status: Postponed (maintainer needs more info) » Active

Yes that's what I meant by refreshing.
I tried using the feeds in my D5 test install and it looks like the url for Africa works but the other two generate errors:
The feed from US seems to be broken, because of error "Invalid document end" on line 177.

India is similar.

Looks like the aggregator can't/won't parse the xml it receives. Which is strange because the feeds validate on http://validator.w3.org/feed/ and the feeds are actually created by Drupal itself (!)

On a side note: are you really using Drupal 5.2 ? because if you are you really need to update.

- Arie

dave reid’s picture

The 6.x core Aggregator is notorious for being very finicky with feeds. Your best bet is to probably use the FeedAPI or similar module instead of the core Aggregrator module.

jimmb’s picture

Interesting, and good to know. As Arie noted, this site is still on Drupal 5.2. What are the odds, do you think, that updating to the newest version (5.15) will solve the issue? Otherwise, perhaps we should switch to FeedAPI....

Thanks,

Jim

dave reid’s picture

I only think the parsing flexibility is in Drupal 7.x, so you're most likely better of using an alternate Aggregator module.

ainigma32’s picture

Status: Active » Postponed (maintainer needs more info)

@jimmb: Did you ever manage to tackle this?

- Arie

zoia’s picture

Hi Arie
I have similar problem in D 6.10 . Though I had the feed url working at the first time for a long period of time it shows me the message The feed from eTwinning seems to be broken, because of error "-145 Connection timed out".
Cron ran successfully. the block shows correctly the feeds but cant have them update anymore . The feed url is http://schooltwinning.wordpress.com/feed/

tryitonce’s picture

I had two aggregator feed problems in Drupal 6.10:

1. Drupal cut the length of the feed I had copied from the BBC website (from a custom search) with the error message of a broken feed.
and (relevant here)
2. The feed from XYZ News seems to be broken, because of error "not well-formed (invalid token)" on line 11.

I solved the first one by adding the feed link to Firefox (as a Live Bookmark) and then I opened the properties of that Live Bookmark and copied the now slightly changed Feed Location to Drupals Feed Aggregator. It worked.

So, I tried the same for the second problem - and presto the reformatting by Firefox also solved the problem discussed here - at least in my case. It's not the most elegant way of doing it - a simple freeware converter would be nice - but it worked - so far.

Here is an example of how it worked for the problem 2:
from the RSS feed I copied this (Copy Link Location - via right-clicking):
http://www.bignewsnetwork.com/?rss=09092dd8aa85375c
and this is what Firefox changed it to:
http://www.bignewsnetwork.com/index.php?rss=09092dd8aa85375c

The problem under 1. was lots of "%2520f", etc. extending the length of the feed URL beyond what i could paste into Drupal.

Good luck

dpearcefl’s picture

Status: Postponed (maintainer needs more info) » Closed (won't fix)

Considering the lack of activity on this issue and that Drupal v5 is no longer supported by for fixes or patches, I am going to close this ticket.