I've made a few customizations to the ris_parser.inc file found in Biblio 6.x-1.14.

At some point, one or more of these might be worth inserting into the official file, but at this point, I thought I would just share them here in case someone else happens to be looking for a similar customization. I'm not so great with PHP or regex and so it is quite likely that this code is poorly written, and it is quite possible that it could significantly impact the performance of the biblio import process.

1. Clean up gremlins
We produce our RIS file by exporting from Reference Manager 9 or 10, and we've found that our exported file contains a number of hidden/non-printable characters that, when imported into biblio, end up as invalid characters and display as question marks or whatever depending on which browser a visitor is using. To fix this, I've inserted a preg_replace to remove non-visible characters from each line before the script parses out the various tags from it. The regular expression identifies all valid characters and says to replace all characters that don't match that set with nothing (i.e. delete them).
Insert the following line between lines 37-38, and between lines 40-41:

        $value = preg_replace('/[^\r\n\t\x20-\x7E\xA0-\xFF]/', '', $value); // Remove non-visible characters

2. Convert PM to PubMed URL
Later versions of Reference Manager have some kind of built-in function for dealing with PubMed ID values: during some import processes, PMID values are automatically inserted into the UR (aka URL) field as "PM:..." and when you are inside Reference Manager, these PM values are actually clickable links that will take you to the relevant PubMed ID page. When Reference Manager exports this field, however, it exports these values without the URL attached, so you end up with a "UR" field that looks like this: "PM:18248701". The following code simply substitutes the first part of that entry "PM:" with the right pieces of the current PubMed URL:"http://www.ncbi.nlm.nih.gov/pubmed/". Note that this is currently written to strip extra characters that may appear at the start of the field, and so will not work if UR field in your RIS file actually contains more than one URL (which it is allowed to do according to RIS specs).
To use this, insert the following line between lines 118-119 (just before the biblio_url value is set):

          $value = preg_replace('/^.*PM:/','http://www.ncbi.nlm.nih.gov/pubmed/',$value); // Replace PubMed ID values with valid URLs

3. Insert JA (abbreviated Journal title) into biblio_alternate_title
This extra code is used to capture the abbreviated version of the Journal title. Reference Manager has a Periodicals term list that should be used to manage journal titles. When it exports a record, it will check in the term list and populate the "JF" field with the full title and the "JA" field with the abbreviated title. Since biblio doesn't have its own term lists, you need both of these values if you want to be able to retain the ability to display the full title of the journal in some contexts while presented the abbreviated version in other contexts. This is the same code as suggested in RIS Import omits Journal Abbreviation field (JA). All this code does is add a new case for the tag "JA" and inserts the contents of that field into the biblio_alternate_title field.

        case 'JA' :
          $node['biblio_alternate_title'] .= $value;
          break;

I would welcome any suggestions for improving any of this code or finding a better place to insert it. Also, I'd be interested in hearing if other users of Reference Manager are running into the same issues when they export into RIS format. It may be that different versions of Reference Manager behave differently.

Phil.

Comments

pkiff’s picture

Status: Active » Fixed

These issues have all now been resolved or dealt with in more recent versions of biblio.

A version of the clean up gremlins has been added:
http://drupal.org/node/1780414

Pubmed PMID/PMCID references are being dealt with in a separate table now, though users of some versions of Reference Manager may still have a reason to convert these values from the URL field.

JA has been added to the list of automatically recognized fields.

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.