Closed (fixed)
Project:
Bibliography Module
Version:
6.x-2.x-dev
Component:
Code
Priority:
Normal
Category:
Feature request
Assigned:
Unassigned
Reporter:
Created:
24 Mar 2011 at 22:39 UTC
Updated:
8 Nov 2011 at 23:01 UTC
A notable percentage of PubMed abstracts contain more than one paragraph. This module's PubMed import strips the abstract to just the first paragraph. (It also strips out text preceding a colon, such as "BACKGROUND: " )
Is it possible to enable the complete abstract to import?
Comments
Comment #1
rjerome commentedThere is nothing in the code that is deliberately doing this, so could you give me the PMID of an article on which this is happening so I can debug it.
Ron.
Comment #2
mroswell commentedHere are a few:
16401813
16467234
http://www.ncbi.nlm.nih.gov/pubmed/16401813?dopt=Abstract
http://www.ncbi.nlm.nih.gov/pubmed/16467234?dopt=Abstract
I've been copying and pasting the full abstracts in where they're missing. So I don't know for sure, if these were "originally" missing.
Comment #3
rjerome commentedOk, I've found and fixed (in 6.x-2.x) the problem with multi-part abastracts.
http://drupalcode.org/project/biblio.git/commit/2458da6
Ron.
Comment #4
mroswell commentedYahoo! Thanks. I've just made a localhost version of my site (should've had one all along). I'll upgrade the module there, satisfy myself that it works, and then upgrade my live site. Sincere thanks.
Comment #5
mroswell commentedWow. In my two-year Drupal hiatus (I've been maintaining Drupal sites, but not developing them), I didn't know that Drupal switched over to git. Cool.
BTW, your module is the reason I'm back on Drupal!
Comment #6
rjerome commentedYah, git's fairly new (about 6 weeks) and I still have a love/hate relationship with it :-)
Comment #7
mroswell commentedSounds a lot like my relationship with Drupal. :)
Comment #9
mcookson commentedHi there, I'm having errors importing Endnote XML data into the 6.x.2 dev version of biblio and wonder if there is a spelling error between 'biblio_sort_title' and 'biblio_short_title'. Thanks, Mike (PS I'm back on Drupal after a long hiatus just so I can use the biblio module!!).
Comment #10
rjerome commentedIt would be best if I could get a copy of the XML file so I can see what the error is.
I've just recently added a new "biblio_sort_title" column.
Ron.
Comment #11
rjerome commentedThat reminds me... Did you recently install the latest -dev? Did you run update.php after installing? Did you check the status page?
Comment #12
mcookson commentedComment #13
mcookson commentedHi Ron. Sorry not to reply sooner. I've run update.php and checked the status (all seems ok) and even done a complete reinstalled with the latest -dev version. I still can't seem to fix the problem with batch imports. I keep getting a 'biblio_sort_title' related error (below). I've attached the XML file for you to look over, but I can't seem to get other Endnote XML data to upload either. It's like the new field hasn't been incorporated into the SQL db, but update.php stated "The following queries were executed| biblio module Update #6034| * Biblio Sort Titles were updated". Any thoughts? Thanks, Mike
XML (attached).
Comment #14
rjerome commentedHmm, I'm a bit baffled by this. I've tested it on a number of different systems and not run into this error. Have you checked the database to see if the "biblio_sort_title" column exists in the biblio table?
Comment #15
mcookson commentedRon, A check of the DB showed that it hadn't created several new variables. I think I've identified the problem - I selected the 6034 -dev version for my update, but this appears to only execute the 6034 script modifications to biblio.install (which migrate biblio records into the new 'biblio_sort_title' field) but not other portions of the script (i.e. not modifications 6033, 6032... 60XX ). When I ran an update using -dev ver. 6033, it performed all updates from 6000 - 6033. I then ran 6034 as a separate update. I'm still having issues with importing multi-paragraph abstracts but will try various import formats and variable mapping and get back to you on this if necessary. Thanks again, Mike
Comment #16
rjerome commentedHi Mike,
Typically, you shouldn't have to select an update number from the drop downs on the update page. Normally when you open that page it is showing you the update (version) that it will be starting from (as opposed to the latest version available) so if you opened that page and it said 6032 for the biblio module (and the highest version is 6034) then it would run updates 6032,6033 and 6034. These drop downs have been removed in 7.x of Drupal because they were just causing more trouble than they were worth.
This issue was originally dealing with multi-part abstracts originating from PubMed imports. If you are using some source other than PubMed, then it probably is another issue and you should open a new issue for it.
Ron.
Comment #17
mcookson commentedRon,
Just a follow-up on this issue. It appears the changes to accommodate multi-part abstracts only apply to PubMed imports. (I can't get import to recognise CR/paragraph markers using Endnote Tagged or XML datasets - just end up with one long paragraph for an abstract.) Can the PubMed functionality be extended to other file types? Thanks.
Mike
Comment #18
rjerome commentedHi Mike,
That's a whole other issue. I would be willing to bet that the cr/lf's are still there but are essentially invisible when displayed in html format. I suspect this is more of a filter issue or more correctly, filters not being applied to the content of those text areas. I'll look into that.
Ron.
Comment #19
ar-jan commentedOriginal issue was fixed.