This issue is specific to PubMed publications but might be generalized to any system which assigns unique identifiers to publications. PubMed sometimes changes the data contained in the XML output for a given PMID, leading to a different MD5 and therefore duplicates of the same PMID publication. Is there a reason that the biblio_pm does not rely solely on the PMID to detect duplicates? Is there a point in importing the same PMID multiple times and create duplicate nodes?

CommentFileSizeAuthor
#4 1034500_4_pubmed_duplicate.patch916 bytesscor

Comments

rjerome’s picture

You are right, the PMID should be used, and I guess the only real need for the check-sum would be to detect if a given PMID has changed at the source, and therefore should be updated in biblio. To be honest, I don't know if this ever happens.

Ron.

scor’s picture

PubMed does change their XML when they update publication dates in the case of an ahead of print publication for example, or when they add MeSH terms which usually come a few weeks/months later. I'm not sure whether an update should be automatically performed upon trying to reimport the same publication, what id you have made changes to the node in the meantime? anyways, this is could be turned off via an option.

rjerome’s picture

I was just working on this very issue, and was thinking that maybe we could just make a new revision to the current node, then at least you would get any new changes, and you could revert back if desired.

Also an option could be added which allows the admin to completely ignore incoming changes to existing PMIDs.

Ron.

scor’s picture

Status: Active » Needs work
StatusFileSize
new916 bytes

simple patch to start off. I've left biblio_pm_check_md5() although it's not used currently, it might be useful in the future.

scor’s picture

Title: Avoid duplicate based on the PMID » Offer the option to update a node when reimporting a pubmed article
Category: bug » feature

This has been committed a while back, so that the duplicate detection is done on the pmid instead of the md5. that's good, I think it's best to not reimport a new node when there is already some existing node for a publication. This is now a feature request

rjerome’s picture

Did you mean options other than what are presented in the "PubMed" section of the 'admin/config/biblio' page?

scor’s picture

Status: Needs work » Fixed

Oh, I didn't know this issue had been fixed.

rjerome’s picture

I guess I forgot to update the issue when I committed those changes, sorry about that.

Ron.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

Pillhuhn’s picture

Version: 7.x-1.x-dev » 6.x-2.0-rc1

I just found this thread when I was trying to update a biblio entry. I had previously added several publications through Pubmed's PMID. Those publications were published online ahead of print. So now these publications are updated with journal volume, issue and page numbers but I do not seem to find a way to easily update those publications on my site.

I could do it manually but I think there should be an easier solution. If I try and add the PMID again, I get the message that this entry already exists. From this thread, I thought there would be an option to re-add the same publication which is then updated.

Is there a way to do so?

I am on Drupal 6.25 with Biblio version 6.x-2.0-rc1

rjerome’s picture

Currently, it only works in 7.x, I'll back port it to 6.x...

Pillhuhn’s picture

Great! Thanks for the quick reply. Any idea about the time frame?

rjerome’s picture

Just an update to let you know that this feature in now in the 6.x-2.x-dev branch and as a bonus, I added a "Cron" update capability. Basically, you can turn on a cron process that will periodically check all your PubMed imported entries and update them automatically if there was a change in the source (PubMed). You will find the new "cron" related settings on the 'admin/settings/biblio' page in the "PubMed" section.

FYI, I've also added the same "cron" update capability to the 7.x branch.

Ron.