Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
For background, please see: http://drupal.org/node/61427
I don't know much about RSS but I think the aggregator should be using the guid of the feed item as the unique identifier and not the url of the feed item.
Comment | File | Size | Author |
---|---|---|---|
#39 | aggregator_guid_update.patch.txt | 595 bytes | sillygwailo |
#33 | aggregator-guid.patch | 469 bytes | bozo-1 |
#25 | aggregator.diff.txt | 676 bytes | drumm |
#23 | aggregator_dup_4.6.patch | 3.6 KB | david-cognite |
#20 | database.diff.txt | 1.54 KB | drumm |
Comments
Comment #1
Steve Dondley CreditAttribution: Steve Dondley commentedOK, here's a first stab at a fix.
Comment #2
Steve Dondley CreditAttribution: Steve Dondley commentedOne other thing:
In addition to applying the patch, you have to create a new field in the aggregator_item table called "guid". It's a varchar with a length of 255.
Comment #3
Zen CreditAttribution: Zen commentedNo idea about the issue, but if this requires a DB change, the patch should incorporate the changes to database.mysql (4.0 and 4.1), database.pgsql and updates.inc.
Thanks,
-K
Comment #4
Dries CreditAttribution: Dries commentedThis patch makes sense to me, but requires a database update.
Comment #5
pavlos CreditAttribution: pavlos commentedHello,
I have applied the patch, but when used in conjunction with the node aggregator module duplicate nodes still get created. Unfortunately my programming experience is not that great in order to modify the aggregator.module.
I applied the patch by hand, line by line and added the guid column in the aggregator_item table.
Temporarily I tried a sort of hack by trying to select distinct node.title from node, but the node.nid is not selected by the query...
The other thing that just came to me is to add a unique index on the link of each feed... Will give it a try...
Any more insight on this would be excellent...
Comment #6
pavlos CreditAttribution: pavlos commentedWhich unfortunately does not work....
Comment #7
Steve Dondley CreditAttribution: Steve Dondley commentedPavlos, I need some clarification. When you say you are getting duplicates with the "node aggregator" module, are you talking about the third party module? If so, whoever is in charge of that module would need to patch that module.
Comment #8
steve22 CreditAttribution: steve22 commentedThe patch stops producing duplicate items for google news feed. But for RSS feeds from other sites, it produces incorrect URL link of the feed. URL links of all feed items are converted to link to the homepage of the feed.
e.g. http://www.mysite.com/news/item/1 becomes http://www.mysite.com
Comment #9
Steve Dondley CreditAttribution: Steve Dondley commentedsteve22,
The patch was written for 4.7. I'm setting the version back to 4.7.
Comment #10
elvis2 CreditAttribution: elvis2 commentedJust wanted to add my two cents worth.
Why not just check if the title (or first few characters) exist before doing the insert. This way you eliminate any duplicate titles and from different sources. This could be a feature in the settings>aggregator area where one can tick this from a checkbox option. Here is the logic:
Before an insert do a quick query
SELECT (*) FROM aggregator_item AS ai
WHERE left(ai.title, 10) != left($edit['title'], 10) AND ai.fid != $edit['fid']
{ do insert }
this looks for a match from the rss title upto 10 characters against what the table ai.title has AND to make sure the fid's are not the same. if both are true then do the insert.
This is rough logic here but I just wanted to express the idea.
Comment #11
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedrestoring status
Comment #12
Steve Dondley CreditAttribution: Steve Dondley commentedHere is the long-awaited database update code.
Comment #13
beginner CreditAttribution: beginner commentedIf it's a bug, shouldn't a patch be provided for cvs first?
Comment #14
Steve Dondley CreditAttribution: Steve Dondley commentedThe patch will likely apply cleanly to both 4.7 and cvs.
Comment #15
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedThis was committed to HEAD
http://cvs.drupal.org/viewcvs/drupal/drupal/modules/aggregator.module?r1...
http://cvs.drupal.org/viewcvs/drupal/drupal/database/updates.inc?r1=1.23...
I can't commit it to 4.7 unless
http://drupal.org/node/64212 is resolved.
Comment #16
bozo-1 CreditAttribution: bozo-1 commentedI'm having the exact same problem as steve22, running 4.7.2 : the patch works just fine as far as Google News entries are concerned, but some other feeds no longer work correctly, redirecting us to the site's homepage instead.
One such example is Toolinux, for which the RSS feed can be found here.
Comment #17
bozo-1 CreditAttribution: bozo-1 commentedNailed it! An else if statement got lost in the battle.
The modified code looks like this around line 856 (second chunk in the patch) :
Which means basically that, as long as there is no $guid, we're going to overwrite $link with $feed['link'];
Replacing it with the following code works like a charm :
Steve, can you submit another patch fixing this?
Comment #18
Steve Dondley CreditAttribution: Steve Dondley commentedThanks for you work on this. I'll post a patch later today. If not, certainly by the end of the coming weekend.
Comment #19
bozo-1 CreditAttribution: bozo-1 commentedMmh, it's weird, I'm still having issues with a couple feeds, the aggregator is pulling some old content every now and then. Did anybody else encounter this kind of trouble?
Comment #20
drummYou need to change the database files too.
Comment #21
bozo-1 CreditAttribution: bozo-1 commentedThe guid entry has already been created in the database. Actually, my latest problem only seems to concern RSS feeds which do not have statements.
Comment #22
drummThe missing column is for clean installs of Drupal HEAD.
Comment #23
david-cognite CreditAttribution: david-cognite commentedIncase anyone is interested, I was able to apply this fix the version or aggregator.module I'm using with 4.6.
The supplied patch file obviously didn't work, but working through it manually matching up the lines (and those from bozo) and making the required changes by hand did. Attached is the patch for the 4.6 version.
Note, I've not tested this thoroughly, but it works for me.
Cheers.
David.
Comment #24
Dries CreditAttribution: Dries commentedNeeds to be re-rolled now the install system is in. :)
Comment #25
drummUpdated.
Comment #26
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedcvs issue. The original issue was more of a feature so I won't backport to 4.7.
Comment #27
Dries CreditAttribution: Dries commentedLooks good to me.
Comment #28
Steve Dondley CreditAttribution: Steve Dondley commentedDoesn't comment 17 still need to be addressed? I think I (or someone) need to roll a new patch. Have not had an opportunity and won't for a few more days still.
Comment #29
alexd-1 CreditAttribution: alexd-1 commentedCouple of issues:
1. Will these fixes work for 4.7 - If not, how difficult to adapt?
2. Will the fix work for both RSS and Atom feeds?
I am using 4.7 and having two problems with aggregator and google news.
If I use RSS feeds I get a duplicate entries: 1 headline only with #39; instead of apostrophes 2nd headline and summary of same story correct: eg:
Follow Tony Blair#39;s Example And Send Zaoui Home - Scoop.co.nz (press release)
Google News Feed - Wed, 12/07/2006 - 5:12pm
Follow Tony Blair's Example And Send Zaoui Home
Scoop.co.nz (press release), New Zealand - 11 Jul 2006
... allow Britain to deport Algerians back to Algeria who have links to extremist organisations, New Zealand should now follow suit and send Ahmed Zaoui home under ...
Categories: Latest News
If I use atom feeds, just get the first headline, no duplication
Comment #30
drummCommitted my patch.
If #17 does need to be addressed, then reopen, hopefully with a patch.
Comment #31
bozo-1 CreditAttribution: bozo-1 commentedIn repy to #29 :
I have had the same problem. The check_plain() function is applied to the plain text between and tags, resulting in some unnecessary (at least, I guess so) sanity checks.
A simple fix is to replace the following line (line 1304 in 4.7.2 aggregator.module) :
$output .= '<a href="'. check_url($item->link) .'">'. check_plain($item->title) ."</a>\n";
with :
$output .= '<a href="'. check_url($item->link) .'" target=\"_blank\">'. $item->title ."</a>\n";
and (line 1320) :
$output = '<a href="'. check_url($item->link) .'">'. check_plain($item->title) .'</a> <span class="age">'. t('%age old', array('%age' => format_interval(time() - $item->timestamp))) .'</span>';
with :
$output = '<a href="'. check_url($item->link) .'">'. $item->title .'</a> <span class="age">'. t('%age old', array('%age' => format_interval(time() - $item->timestamp))) .'</span>';
In reply to #30
Just pulled the CVS version, the issue I raised in #17 is still present. I'm out of town tomorrow, so I'll try to roll a patch for both issues on Tuesday.
Comment #32
drummPlease open a new issue for that; this one was about adding guids.
Comment #33
bozo-1 CreditAttribution: bozo-1 commentedHere's a simple patch for the issue raised in #17.
Comment #34
bozo-1 CreditAttribution: bozo-1 commentedSwitching status back to "patch". The patch applies to the CVS version.
Comment #35
drummCommitted to HEAD.
Comment #36
(not verified) CreditAttribution: commentedComment #37
sillygwailoDid this get into 4.7? I see the change before 4.7.3 but it doesn't seem to be in 4.7 i.e. it got tagged as MAIN but not HEAD? That said (and correct me if I'm wrong), but it looks like it's going to be in for 5.0.
Comment #38
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedno, didn't get into HEAd. If you deem this important, roll a patch from the two patches that went in. The issue with the update system has been resolved. Marking active.
Comment #39
sillygwailoSteve Dondley's patch at the top for aggregator.module still applies to 4.7. The updates.inc patch doesn't, however, so I attached one, very slightly modified (183 instead of 186).
Comment #40
sillygwailoComment #41
drummIf this is for 4.7 set the version correctly. If this is for HEAD, write a patch to the correct file.
Comment #42
sillygwailoAs I believe this is fixed in 5.0, this issue is for 4.7.
Comment #43
magico CreditAttribution: magico commentedGiving this some time, if someone wants to provide a patch that backports from 5 to 4.7.5
Comment #44
killes@www.drop.org CreditAttribution: killes@www.drop.org commentedI don't think we still need this.