Strange Aggregator behaviour with Blogspot feeds
teledyn - March 5, 2007 - 14:55
| Project: | Drupal |
| Version: | 6.3 |
| Component: | aggregator.module |
| Category: | bug report |
| Priority: | critical |
| Assigned: | Unassigned |
| Status: | duplicate |
Jump to:
Description
All the other news feeds migrated perfectly from 4.x to DRUPAL-5 aggregator and I have not seen this happen with other feeds imported to the new system, but we have one new feed, from the BlogSpot site http://thewee3.blogspot.com/atom.xml, where the feed itself is fine, but the new aggregator reports reading the stories and the headlines appear in the feed block, but all the stories have the same URL (the first story's link)
We have deleted the feed and re-added it, and get the same results.

#1
I hope you haven't mixed up between the aggregation module and another one because you mentioned aggregator in your post. The aggregation (not aggregator) module currently supports RSS only. ATOM and RDF are being considered, but there's no deadline set since I'm pretty short on time.
You can easily add handling for ATOM and/or RDF, and if you'd like advanced customizations then you can contact me directly as well. Please check the readme file for details.
#2
oops, sorry, maybe I did choose the wrong one; I didn't realize there were two until you pointed it out -- here's what I have in the .info file:
$ more aggregator.info; $Id: aggregator.info,v 1.3 2006/11/21 20:55:33 dries Exp $
name = Aggregator
description = "Aggregates syndicated content (RSS, RDF, and Atom feeds)."
package = Core - optional
version = VERSION
I don't have the contrib module installed, just this one from the core DRUPAL-5-1 sources.
is it possible to move this bug to the other project's issues?
#3
Don't know if this is Aggregator2 or Aggregator Node; the core DRUPAL-5-1 module just calls itself "Aggregator" but the Issues Tracker has no such module and neither of the Aggregator modules has a Version listing for 5.1
#4
You should report the bug in drupal's bug tracking system I believe, if the bug does exist, then it is considered a core bug.
Choose "Support" tab on drupal's site then check under "Bug Reports" on how to do that.
#5
Ah ... so Issues is not for Bugs. Got it. Thanks.
#6
What gives??? I did just as you said, and it leads me right back here to the Issues pages!!
#7
Sorry for that, seems I got you mixed up. Issues are for bugs, Aggregator module is a core module, so you won't find it listed in the module's section. Instead, it's considered a core module.
Core modules are considered "Drupal". This includes all the modules that come with a default drupal installation, weather enabled by default or not.
So you simply needed to change the project from the "Project" drop-down from "aggregator2" to "Drupal". And change the component drop-down to "aggregator.module". I'll do that right now, but just wanted to clear that up :-)
#8
Thanks. I have now confirmed this bug on another 5.1 installation; in 4-7 the blogspot atom feeds were simply ignored (zero entries) but in the 5-1 the results appear to be unpredictable, sometimes null stories, sometimes all stories assigned to the wrong URL (url of the first story?)
there is a workaround: blogspot feeds have an rss2 option: use the alternate
feeds/posts/default?alt=rssand the feed will integrate properly.#9
I guess this is solved now. closing...
#10
I'm still seeing this bizarre behavior with Atom feeds from a Blogger blog. I can see the posts come through just fine, but the urls are all messed up -- all point to the same one (I think the one for the first post).
I tried the workaround suggested by teledyn in #8, but then all I get from the feed is the titles and post times w/o any content. I looked at the xml file at the default?alt=rss url for the blog and it looks fine -- it shows the first paragraph of each post.
#11
I am having the same experiences as #10. With the RSS feed, you only get the titles. With the atom feed, you get the wrong links to the blogspot posts but you get teasers.
#12
I'm having similar behaviour... Usually all that'll appear through aggregator is the URL of an old article, but sometimes the title and teaser text for the most recent articles show up with the old article URLs. I'm getting this both through the original feed and through FeedBurner's version of it.
#13
Did anyone test/confirm this problem with the "aggregation" module?
If so, does it happen with the same URL in the issue description or a different one (please point out the troublesome URL if the problem is also valid in the aggregation module).
Thanks
#14
I'm seeing this behavior too. When I add the atom.xml feed to News Aggregator, it correctly picks up title information about the latest post, but the link it provides is to the oldest post on the feed. I don't know if Blogger's implementation of Atom is non-standard or what, but it doesn't seem to happen with other platforms (Wordpress blog atom feeds, for instance, seem to work fine).
#15
I'm experiencing the same thing with Drupal 4.7.1 (and I tested with the same results in 5.1) on one of my client websites.
For reference here is the feed with inaccurate info - note how all of the links go to the same page which is the bottom feed entry:
http://www.waeagles.com/?q=aggregator/sources/2
That feed references a feedburner link which appears to work perfectly:
http://feeds.feedburner.com/eagleforum/JLEj
The original feed aggregation is available at and it has the same problem:
http://www.eagleforum.org/blog/atom.xml
One the site I have a second aggregation up with that one at:
http://www.waeagles.com/?q=aggregator/sources/1
Any thoughts? I'm going to take a look at the code, but I'd love to see a patch for this!
#16
Sorry, I'll leave this with 5.2, but recognize that it appears to be an issue with older versions too.
#17
Okay, I found a workaround using FeedBurner. The problem is somehow related to the fact that with Atom the "link" field for each entry isn't picked up in the parsing. I didn't look into exactly why this is, but because aggregator works fine with RSS I switched my FeedBurner setup over to that. Here is how:
That did the trick for me!
#18
yes, this is a bummer.
I opted to go for: /feeds/posts/default?alt=rss at the end of my URL,
so, at least the user gets to the right article,
rather than just print out the Content & a wrong link.
Hope someone works out how to fix this (without having to rely on feedburner)...
#19
I can confirm that this bug with the Atom format persists in the 5.7 version. The feedreader and ?alt=rss workarounds are good, but I think this is a problem that developers should work on.
#20
I have found that the error occurs in the "LINK" case handling of the aggregator_element_start() function (line 607).
When you add
$items[$item]['LINK'] = $attributes['HREF'];to the "else"-Statement, too, the bug is gone. I'm absolutely not sure what side-effects this may cause due to a lack of test cases. Can anyone have a look at this?#21
Sorry, can you explain with more extended code?
case 'LINK':if ($attributes['REL'] == 'alternate') {
if ($element == 'ITEM') {
$items[$item]['LINK'] = $attributes['HREF'];
}
else {
$channel['LINK'] = $attributes['HREF'];
}
}
break;
This is the part of you are talkink about. What change I have to do?
Thanks and sorry
#22
@RahDick Yep, I think that's the issue too---though the broader issue is why is that if-statement having to get to the else.
I too was running into this issue, and I'm not sure why blogger's Atom Feeds seem to have it as opposed to any other Atom feed. The one irregularily I did find is that Blogger seems to use single-quotes in their feed, rather than double-quotes, but I'm no XML genius so I don't know if that's what's bunging stuff up.
case 'LINK':if ($attributes['REL'] == 'alternate') {
if ($element == 'ITEM') {
$items[$item]['LINK'] = $attributes['HREF'];
}
else {
$items[$item]['LINK'] = $attributes['HREF']; // ++++++
$channel['LINK'] = $attributes['HREF'];
}
}
break;
But that seems like a hack because the issue is really that if-statement. Why isn't the parser correctly reading $element == 'ITEM'?
Also, there is some weird flow to database insertions:
1) It seems like when you first create the feed in aggregator and update it for the first time, no new Items are added but the Feed's Link is set to the last item's Link (probably because of that wonky if-statement above). But the Feed's Link is added to the Feed *after* (or rather, it's never re-retrieved) it tries to insert the items. This is a problem because the Add Item logic says "we have to have a link, if we don't, use the Feed's link" but it doesn't have the feed link (even improperly set) so no items are inserted.
2) But... if you then say "delete items" on the aggregator admin screen (even tho there are no items---this is because the aggregator sends a request that says "I checked on this date, is there anything newer?....Nope, ok, never mind" so nothing new will be pulled since it barfed on parsing last time),
3) Now if you refreesh the feed, it *will* add the items!!!---though the links will all be set to the feed's link (which is the last item's link from the first time the feed was refreshed, so still wrong).
Whew. So that's where I'm at.
#23
Is there any action on this front? Will a fix be included in the next version of drupal? This bug is a year old!
#24
I'm experiencing odd issues with BlogSpot feeds myself. In the log it said new items were found for one of them, but no items are downloaded at all.
#25
I can confirm.
This issue still exists in Drupal 6.3.
It's not just with blogspot.com though, but also with other ATOM-feeds I've tried.
#26
Duplicate of #130344: Headline links do not get parsed in some ATOM feeds
#27
The article above still did not solve the issue.
In my case, I will try the feedburner way out.
love and light
#28
Has no one resolved this? I cannot aggregate blogspot feeds correctly for the life of me, I've tried alt=rss even and it doesn't really do much. Basically all brackets are stripped out of tags leaving html behind. Its not an encoding issue is this?
#29
I rather expect this bug is not going to be fixed. For this and other similarly embarrasing reasons, I've long since ceased using drupal.