Core aggregator dies too gracefully! If an RSS feed contains invalid markup, the aggregator returns, "There are new items in......" but it fails to import any items and the feed listing shows '0 items'. Shouldn't it at least return an error informing the user that it failed to import any items? This has happened to me on various feeds but this is the latest:

http://www.trulia.com/rss2/for_sale/Foreman,AR;Ashdown,AR;Hope,AR;Nashvi...

I've narrowed it down to an incorrect 'pubDate' element. in the form of:

<pubDate>Fri, 27 Jun 2008</pubDate> *invalid form

If I remove the 'pubDate', it loads fine. However, if I include the following example valid date format from FeedValidator.org, it will fail too.

<pubDate>Wed, 02 Oct 2002 08:00:00 EST</pubDate> *unknown error

Finally, this 'pubDate' from a Google RSS feed works:

<pubDate>Sun, 13 Jul 2008 07:49:48 GMT</pubDate> *acceptable

I can't determine an appreciable difference in the later two. Any ideas?

Comments

bradnana’s picture

BTW, on a hunch I decided to check other aggregators.

1. FeedAPI seems to replicate the behavior of the core aggregator. It fails with no error (too graciously) on the same invalid 'pubDate' elements.

2. Aggregation (not core Aggregator) apparently forgives the incorrect element and loads the feed properly.

bradnana’s picture

OK, so now I realize the problem is not 'pubDate' being an invalid format. Instead it seems to be the fact that the 'pubDate' in the feed specifies a date older than the limit set for discarding items which is found on the settings page of the aggregator module (content/aggregator/settings). But is this the intended behavior? It certainly does not read this way. The settings page says:

Discard items older than:
options:(1 hour 3 hours 6 hours 9 hours 12 hours 1 day 2 days 3 days 1 week 2 weeks 4 weeks 8 weeks 16 weeks)
The length of time to retain feed items before discarding. (Requires a correctly configured cron maintenance task.)

The assumption that the user is going to make is that all new items in the feed will get loaded regardless of what their 'pubDate' is, and that agreggator will either read the GUID or set something in the item to be the GUID and discard the item once that GUID reaches a certain age in the system.

I'm new to feeds, is this normal behavior?

swe3tdave’s picture

Version: 6.3 » 6.4
Category: support » bug

i have the same problem with this feed: http://astuces-ubuntu.blogspot.com/feeds/posts/default

This is an example of the date format for that feed:
<published>2008-08-18T09:52:00.004-04:00</published><updated>2008-08-18T10:06:02.659-04:00</updated>

I think i should mention, that i never had any problem with the old planet.module from 5.x. The date reported correctly. Maybe someone should look at it, i dont know...

Personnaly, i think that discarding items should be an option. And I dont see why there should be a separate module for planets. Maybe this could be a feature request but because there is at least the date element problem, i am marking this as a bug report.

dpearcefl’s picture

Status: Active » Postponed (maintainer needs more info)

Does this issue exist in current D6?

dpearcefl’s picture

Status: Postponed (maintainer needs more info) » Active

Status: Active » Closed (outdated)

Automatically closed because Drupal 6 is no longer supported. If the issue verifiably applies to later versions, please reopen with details and update the version.