A notable percentage of PubMed abstracts contain more than one paragraph. This module's PubMed import strips the abstract to just the first paragraph. (It also strips out text preceding a colon, such as "BACKGROUND: " )

Is it possible to enable the complete abstract to import?

Comments

rjerome’s picture

There is nothing in the code that is deliberately doing this, so could you give me the PMID of an article on which this is happening so I can debug it.

Ron.

mroswell’s picture

Here are a few:
16401813
16467234

http://www.ncbi.nlm.nih.gov/pubmed/16401813?dopt=Abstract
http://www.ncbi.nlm.nih.gov/pubmed/16467234?dopt=Abstract

I've been copying and pasting the full abstracts in where they're missing. So I don't know for sure, if these were "originally" missing.

rjerome’s picture

Version: 6.x-1.15 » 6.x-2.x-dev
Status: Active » Fixed

Ok, I've found and fixed (in 6.x-2.x) the problem with multi-part abastracts.

http://drupalcode.org/project/biblio.git/commit/2458da6

Ron.

mroswell’s picture

Yahoo! Thanks. I've just made a localhost version of my site (should've had one all along). I'll upgrade the module there, satisfy myself that it works, and then upgrade my live site. Sincere thanks.

mroswell’s picture

Wow. In my two-year Drupal hiatus (I've been maintaining Drupal sites, but not developing them), I didn't know that Drupal switched over to git. Cool.

BTW, your module is the reason I'm back on Drupal!

rjerome’s picture

Yah, git's fairly new (about 6 weeks) and I still have a love/hate relationship with it :-)

mroswell’s picture

Sounds a lot like my relationship with Drupal. :)

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.

mcookson’s picture

Hi there, I'm having errors importing Endnote XML data into the 6.x.2 dev version of biblio and wonder if there is a spelling error between 'biblio_sort_title' and 'biblio_short_title'. Thanks, Mike (PS I'm back on Drupal after a long hiatus just so I can use the biblio module!!).

rjerome’s picture

It would be best if I could get a copy of the XML file so I can see what the error is.

I've just recently added a new "biblio_sort_title" column.

Ron.

rjerome’s picture

Status: Closed (fixed) » Active

That reminds me... Did you recently install the latest -dev? Did you run update.php after installing? Did you check the status page?

mcookson’s picture

mcookson’s picture

Hi Ron. Sorry not to reply sooner. I've run update.php and checked the status (all seems ok) and even done a complete reinstalled with the latest -dev version. I still can't seem to fix the problem with batch imports. I keep getting a 'biblio_sort_title' related error (below). I've attached the XML file for you to look over, but I can't seem to get other Endnote XML data to upload either. It's like the new field hasn't been incorporated into the SQL db, but update.php stated "The following queries were executed| biblio module Update #6034| * Biblio Sort Titles were updated". Any thoughts? Thanks, Mike

user warning: Unknown column 'biblio_sort_title' in 'field list' query: INSERT INTO biblio (nid, vid, biblio_type, biblio_sort_title, biblio_secondary_title, biblio_year, biblio_date, biblio_lang, biblio_abst_e, biblio_full_text, biblio_call_number, biblio_citekey, biblio_coins, biblio_label, biblio_md5) VALUES (54983, 54983, 105, 'Letter Thanks NBPOL', 'The National', 2009, 'Wednesday, 20 May 2009', 'eng', 'Thanks, NBPOLON behalf of my family, relatives and the people of Kove in West New Britain province, I would like to thank Nick Thompson, the chief executive officer of New Britain Palm Oil Limited for acknowledging the visions of my late brother Bernard Vogae for oil palm development in the inland Kove area. The project was initiated by WNB provincial government under the name Kulu-Dagi and Inland Kove project when Vogae was the governor. His dream of a major agriculture development in the Kove area has now become a reality after it was shelved in 2000 following his death. The project was officially launched on May 12 and I call on the people of Kove to capitalise on this opportunity to improve their living standard. Steven Keu, Kimbe Copyright © 2008 Pacific Star Limited - The NATIONAL. All rights reserved.', 0, '5-20', '54983', '', 'H', '6336e46551f0fcea8ef19cd9088c92c1') in /home/pngweb.org/domains/dev.pngweb.org/public_html/includes/common.inc on line 3538.

XML (attached).

rjerome’s picture

Hmm, I'm a bit baffled by this. I've tested it on a number of different systems and not run into this error. Have you checked the database to see if the "biblio_sort_title" column exists in the biblio table?

mcookson’s picture

Ron, A check of the DB showed that it hadn't created several new variables. I think I've identified the problem - I selected the 6034 -dev version for my update, but this appears to only execute the 6034 script modifications to biblio.install (which migrate biblio records into the new 'biblio_sort_title' field) but not other portions of the script (i.e. not modifications 6033, 6032... 60XX ). When I ran an update using -dev ver. 6033, it performed all updates from 6000 - 6033. I then ran 6034 as a separate update. I'm still having issues with importing multi-paragraph abstracts but will try various import formats and variable mapping and get back to you on this if necessary. Thanks again, Mike

rjerome’s picture

Hi Mike,

Typically, you shouldn't have to select an update number from the drop downs on the update page. Normally when you open that page it is showing you the update (version) that it will be starting from (as opposed to the latest version available) so if you opened that page and it said 6032 for the biblio module (and the highest version is 6034) then it would run updates 6032,6033 and 6034. These drop downs have been removed in 7.x of Drupal because they were just causing more trouble than they were worth.

This issue was originally dealing with multi-part abstracts originating from PubMed imports. If you are using some source other than PubMed, then it probably is another issue and you should open a new issue for it.

Ron.

mcookson’s picture

Ron,
Just a follow-up on this issue. It appears the changes to accommodate multi-part abstracts only apply to PubMed imports. (I can't get import to recognise CR/paragraph markers using Endnote Tagged or XML datasets - just end up with one long paragraph for an abstract.) Can the PubMed functionality be extended to other file types? Thanks.
Mike

rjerome’s picture

Hi Mike,

That's a whole other issue. I would be willing to bet that the cr/lf's are still there but are essentially invisible when displayed in html format. I suspect this is more of a filter issue or more correctly, filters not being applied to the content of those text areas. I'll look into that.

Ron.

ar-jan’s picture

Status: Active » Closed (fixed)

Original issue was fixed.