Hi, I have an aggregation site with 18 feeds. If I refresh them inividually, everything works great. If I let cron.php refresh them, I get pretty consistent parse/format errors. Any ideas on what might be going on?

Comments

ahwayakchih’s picture

Ca You post here at least part of the errors? And maybe URL to one of those problematic feeds?

fivesticks’s picture

OK here's what's in the Log (just a few lines--there are more)

aggregator2 2005-11-16 21:00 Syndicated content from Where's My Plan?. John Sherck view details
error aggregator2 2005-11-16 21:00 Failed to validate aggregator2-feed for Where's My Plan? John Sherck view details
error aggregator2 2005-11-16 21:00 Errors in entry The bookstore and the a new (or maybe old) l John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Daily Life Update from Where's My P John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Jimi Plays Berkeley (and the '60s) from John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Sunday View -- Morning Breakfast from < John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Update from Sick Bay (not to be confused wit John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry The lesson is: Our God is vengeful! O spitef John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Thursday Think 'n' Share XX from Wh John Sherck details
aggregator2 2005-11-16 21:00 Syndicated content from Stu's Blog. KenyonBlogs.org view details
error aggregator2 2005-11-16 21:00 Failed to validate aggregator2-feed for Stu's Blog: KenyonBlogs.org view details
error aggregator2 2005-11-16 21:00 Errors in entry Give me a B! Give me an L! Give me an O! KenyonBlogs.org details
cron 2005-11-16 20:00 Cron run completed Anonymous details

Note that these feeds succeed when I process them individually.

The feed URLs are:

http://wheresmyplan.blog-city.com/index.rss?format=atom

http://blogsbystu.blogspot.com/atom.xml

Thanks for taking a look through it!

ahwayakchih’s picture

I've just tested both feeds and they both worked ok...

Are You sure everything is setup properly? Did You use some older version of aggregator2 before?
Maybe You didn't update database tables layout?

fivesticks’s picture

Thanks for your help on this--I appreciate it.

The problem isn't with the individual feeds. If I refresh them individually, everything works great. However, when I refresh them enmasse using cron.php, that's when I get the errors. I installed a2 only about 6 weeks ago, and since the individual refreshes work, I'm thinking the SQL is fine.

My first thought when thinking about troubleshooting this is that since it only fails when it's updating the whole group is that some variable is holding a stale value that causes the parser to think its looking at one thing when it's actually looking at another, at which point it throws the exception.

Any ideas?

ahwayakchih’s picture

I tried running cron.php and it worked fine :(

Hmm... the only thing i can think if is parsing functions (aggregator2_parse_xml, aggregator2_element_start, aggregator2_element_end, aggregator2_element_data). They use global variables. But that shouldn't have any side-effect because scripts are run in "separate space", and in single thread. So one script (as in: one visit to site) shouldn't use other's data. Global variables are "re-initialized" before each parsing, so they should be "clean".
What version of PHP You use? Maybe it's another thing with PHP5?

fivesticks’s picture

I'm on 4.3.8, so its not a 5 issue. I haven't looked through the code at all yet. Is there an easy way to refactor so the actual refreshes are handled by an object and that object is completely destroyed and refreshed between blogs? That would remove any possibility of stale data sitting around.

ahwayakchih’s picture

I'm on 4.3.8, so its not a 5 issue. I haven't looked through the code at all yet. Is there an easy way to refactor so the actual refreshes are handled by an object and that object is completely destroyed and refreshed between blogs? That would remove any possibility of stale data sitting around.

Unfortunetly, as far as i know, there is none. Expat based API doesn't allow passing more info to parsing functions. The only way is to use class, but Drupal doesn't use classes anywhere, and i don't want to break "rules".

systmc’s picture

I'm having the same problem. When searching Drupal.org for help, I came across other posts reporting the same problem before coming across this bug entry. I started with a clean install of Drupal and Aggregator2, added about a dozen feeds, and within a couple days I started seeing "Failed to validate aggregator2-feed for [...]" errors. If I manually refreshed the feeds, it'd work. Now, I'm getting errors with *every* feed when Aggregator2 auto-updates - it just doesn't work anymore. If I want new feed items, I must manually refresh every feed. Not sure what's going on, but several people are having this problem. Aggregator2 is great even without auto-update working, but for my site this is a show-stopper.

kees’s picture

Version: master » 4.6.x-1.x-dev
Priority: Normal » Critical

Like the other posters, I have the same issue. Loading cron.php manually from Firefox works like a charm; all the feeds are refreshed and things work fine. However, when I try the same from cron using curl or wget, I get "Failed to parse RSS feed : invalid schema ."

I really have no explanation, other than a possible timeout that we run in somewhere. This problem happens consitently and is fully reproducable.

kees’s picture

I narrowed down the problem; it has nothing to do with the browser used, but everything with the fact that whenever I tried with Mozilla, I was logged in to my site. It appears that the fake login that the module attempts to perform in its _cron function does not work. I'll post back when I have found a solution.

kees’s picture

I'll leave closing this bug for the maintainer, but it's really not a bug. Once I put in some tracing code, I figured out what is going on. The only thing that went wrong, was that the feed source could not be viewed by anonymous users. I use node_privacybyrole, and as soon as I checked the View permission for Anonymous users, everything worked like a charm.