Hi

i install the module and everything is ok, but i get some duplicate articles from the feed url, same news twice

Thanks

Comments

alex_b’s picture

hi toma,

can you check, if the two articles lead to the same original source article? could you post the orginal URLs of the two articles?

alex

toma’s picture

Hi

Thanks for your reply

The two articles came from the same source, you can see at my test website (french)
the source feed
http://www.blogelle.com/beaute-femme
Source URL http://www.beaute-femme.org/news/rss.php
Last checked il y a 1 minute 8 secondes
Time until next refresh 2 heures 58 minutes restant

example of duplicate articles
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain-0
when i leech data i receive that
10 item(s) added, 0 duplicate(s) found.
Errors



    * user warning: Duplicate entry 'credit-a-la-consommation-ce-qu-il-faut-savoir' for key 2 query: INSERT INTO url_alias (src, dst) VALUES ('node/984', 'credit-a-la-consommation-ce-qu-il-faut-savoir') in /home/web/annoncemaroc/includes/database.mysql.inc on line 121.
    * user warning: Duplicate entry 'beaute-femme' for key 2 query: INSERT INTO url_alias (src, dst) VALUES ('node/963', 'beaute-femme') in /home/web/annoncemaroc/includes/database.mysql.inc on line 121.


You can see here
www.blogelle.com

alex_b’s picture

Thanks toma,

the error occurs when a duplicate entry is being inserted in url_alias. What's strange is, that before that error you should have gotten one by leech that says something similar. I checked
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain-0 and saw that BOTH point to

http://www.beaute-femme.org/blog-femme/maison-en-bois-52 - the same URL - leech really should catch that.

Can you post the entries for the two articles (first two URLs here) in the leech_news_item table? You should be able to identify them by their node id.

thank you - alex

toma’s picture

Hi
thanks for your help; i copy past the table for the two nodes
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain (id 986)
http://www.blogelle.com/la-maison-en-bois-habitat-de-demain-0 (id 987)

(986, 963, 'http://www.beaute-femme.org/blog-femme/maison-en-bois-52', '', '80bda4cd1d19d9e90077d494c9f42a40', '', '', ''),
(987, 963, 'http://www.beaute-femme.org/blog-femme/maison-en-bois-52', '', '80bda4cd1d19d9e90077d494c9f42a40', '', '', ''),
funana’s picture

Title: get duplicate content » duplicate content on first leech
Version: 4.7.x-1.2 » 4.7.x-1.3

When I leech a feed for the first time there are often a lot of duplicate entries. If I manually delete the duplicate items and leech again (or let them be leeched by cron) no duplicates are produced.
I don't know if this is a hap, but it seems that this mainly occured on atom feeds. But I am not sure...

funana’s picture

Title: duplicate content on first leech » duplicate content

ooops, sorry for changing the subject. Changed it back to "duplicate content".

alex_b’s picture

I cannot reproduce your error. Look here:

http://leechgroups.devseed.org/leech/feed/1122 - leech 4.7-dev (identical dupe handling), PHP 4.4.4/MySQL 4.1.20

What happens if you turn off the pathauto module? Try to exclude any other negative interactions with other modules. (Best you try this on a fresh installation).

This is a peculiar error. I would like to get to the ground of this. Thanks for keeping onto it.

toma’s picture

I delete all previous entries and disable pathauto module, i leech data

* Error while using the Yahoo Terms service. Please check the server internet connection and check if cURL php extension is installed.
    * 10 item(s) added, 0 duplicate(s) found.

Just 3 nodes added! i try with other feeds, work fine, yahoo terms service works also, no duplicate content.

aron novak’s picture

Status: Active » Postponed (maintainer needs more info)

Have you experienced that leech isn't compatible w/ pathauto module? If it is disabled then leech works as you want?

alex_b’s picture

Status: Postponed (maintainer needs more info) » Closed (duplicate)

Feed item duplication issue is fixed: http://drupal.org/node/135333