Refresh multiple blogs fails on cron run

fivesticks - November 16, 2005 - 03:39
Project:Aggregator2
Version:4.6.x-1.x-dev
Component:Code
Category:bug report
Priority:critical
Assigned:Unassigned
Status:active
Description

Hi, I have an aggregation site with 18 feeds. If I refresh them inividually, everything works great. If I let cron.php refresh them, I get pretty consistent parse/format errors. Any ideas on what might be going on?

#1

ahwayakchih - November 16, 2005 - 15:44

Ca You post here at least part of the errors? And maybe URL to one of those problematic feeds?

#2

fivesticks - November 17, 2005 - 03:15

OK here's what's in the Log (just a few lines--there are more)

aggregator2 2005-11-16 21:00 Syndicated content from Where's My Plan?. John Sherck view details
error aggregator2 2005-11-16 21:00 Failed to validate aggregator2-feed for Where's My Plan? John Sherck view details
error aggregator2 2005-11-16 21:00 Errors in entry The bookstore and the a new (or maybe old) l John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Daily Life Update from Where's My P John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Jimi Plays Berkeley (and the '60s) from John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Sunday View -- Morning Breakfast from < John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Update from Sick Bay (not to be confused wit John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry The lesson is: Our God is vengeful! O spitef John Sherck details
error aggregator2 2005-11-16 21:00 Errors in entry Thursday Think 'n' Share XX from Wh John Sherck details
aggregator2 2005-11-16 21:00 Syndicated content from Stu's Blog. KenyonBlogs.org view details
error aggregator2 2005-11-16 21:00 Failed to validate aggregator2-feed for Stu's Blog: KenyonBlogs.org view details
error aggregator2 2005-11-16 21:00 Errors in entry Give me a B! Give me an L! Give me an O! KenyonBlogs.org details
cron 2005-11-16 20:00 Cron run completed Anonymous details

Note that these feeds succeed when I process them individually.

The feed URLs are:

http://wheresmyplan.blog-city.com/index.rss?format=atom

http://blogsbystu.blogspot.com/atom.xml

Thanks for taking a look through it!

#3

ahwayakchih - November 17, 2005 - 17:07

I've just tested both feeds and they both worked ok...

Are You sure everything is setup properly? Did You use some older version of aggregator2 before?
Maybe You didn't update database tables layout?

#4

fivesticks - November 18, 2005 - 14:29

Thanks for your help on this--I appreciate it.

The problem isn't with the individual feeds. If I refresh them individually, everything works great. However, when I refresh them enmasse using cron.php, that's when I get the errors. I installed a2 only about 6 weeks ago, and since the individual refreshes work, I'm thinking the SQL is fine.

My first thought when thinking about troubleshooting this is that since it only fails when it's updating the whole group is that some variable is holding a stale value that causes the parser to think its looking at one thing when it's actually looking at another, at which point it throws the exception.

Any ideas?

#5

ahwayakchih - November 18, 2005 - 17:28

I tried running cron.php and it worked fine :(

Hmm... the only thing i can think if is parsing functions (aggregator2_parse_xml, aggregator2_element_start, aggregator2_element_end, aggregator2_element_data). They use global variables. But that shouldn't have any side-effect because scripts are run in "separate space", and in single thread. So one script (as in: one visit to site) shouldn't use other's data. Global variables are "re-initialized" before each parsing, so they should be "clean".
What version of PHP You use? Maybe it's another thing with PHP5?

#6

fivesticks - November 19, 2005 - 03:17

I'm on 4.3.8, so its not a 5 issue. I haven't looked through the code at all yet. Is there an easy way to refactor so the actual refreshes are handled by an object and that object is completely destroyed and refreshed between blogs? That would remove any possibility of stale data sitting around.

#7

ahwayakchih - November 26, 2005 - 22:38

I'm on 4.3.8, so its not a 5 issue. I haven't looked through the code at all yet. Is there an easy way to refactor so the actual refreshes are handled by an object and that object is completely destroyed and refreshed between blogs? That would remove any possibility of stale data sitting around.

Unfortunetly, as far as i know, there is none. Expat based API doesn't allow passing more info to parsing functions. The only way is to use class, but Drupal doesn't use classes anywhere, and i don't want to break "rules".

#8

systmc - March 12, 2006 - 16:22

I'm having the same problem. When searching Drupal.org for help, I came across other posts reporting the same problem before coming across this bug entry. I started with a clean install of Drupal and Aggregator2, added about a dozen feeds, and within a couple days I started seeing "Failed to validate aggregator2-feed for [...]" errors. If I manually refreshed the feeds, it'd work. Now, I'm getting errors with *every* feed when Aggregator2 auto-updates - it just doesn't work anymore. If I want new feed items, I must manually refresh every feed. Not sure what's going on, but several people are having this problem. Aggregator2 is great even without auto-update working, but for my site this is a show-stopper.

#9

kees - March 18, 2006 - 19:57
Version:HEAD» 4.6.x-1.x-dev
Priority:normal» critical

Like the other posters, I have the same issue. Loading cron.php manually from Firefox works like a charm; all the feeds are refreshed and things work fine. However, when I try the same from cron using curl or wget, I get "Failed to parse RSS feed : invalid schema ."

I really have no explanation, other than a possible timeout that we run in somewhere. This problem happens consitently and is fully reproducable.

#10

kees - March 19, 2006 - 13:25

I narrowed down the problem; it has nothing to do with the browser used, but everything with the fact that whenever I tried with Mozilla, I was logged in to my site. It appears that the fake login that the module attempts to perform in its _cron function does not work. I'll post back when I have found a solution.

#11

kees - March 19, 2006 - 13:59

I'll leave closing this bug for the maintainer, but it's really not a bug. Once I put in some tracing code, I figured out what is going on. The only thing that went wrong, was that the feed source could not be viewed by anonymous users. I use node_privacybyrole, and as soon as I checked the View permission for Anonymous users, everything worked like a charm.

 
 

Drupal is a registered trademark of Dries Buytaert.