marc import issues
| Project: | Millennium Integration |
| Version: | 5.x-1.2 |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | janusman |
| Status: | closed |
Jump to:
I have noticed 3 separate issues with the marc import:
1. Under Taxonomy: MARC import mappings in the settings for Author it says "This maps MARC 100s, 110s, 700s, 710s (all subfields merged together as text) to this vocabulary." In fact, I have found it is only taking the a subfield. This is a real problem for government publications. For example, this document:
http://search.cambridgelibraries.ca/record=i1047937&searchscope=0
Where the author should be "Ontario. Legislative Assembly. Standing Committee on Social Development." ends up with an author of just "Ontario."
2. Something strange is happening when there is more than one author. For example, with this record:
http://search.cambridgelibraries.ca/record=i1050532&searchscope=0
the illustrator ends up listed as "Series Editor".
3. I have noticed that if the import process hits a string of deleted item records, it basically gets stuck until you can give it an existing item record to start with again. For example, if you have it set to import 50 records at a time and the next 50 records it is looking at don't exist, it's never going to make it any further until you tell it to start at another existing item record.
Not sure if the above is clear or not...thanks for all your good work on this module!

#1
4. To add to #2 above, that problem happens when there is a 100 and a 700 field. If there is no 100 and 2 or more 700 fields, the import seems to work properly. But there is a problem with trailing diacritics in this case. For example, please see this record:
http://search.cambridgelibraries.ca/record=i1049708&searchscope=0
Once imported, the author Régimbald, André becomes Régimbald, Andr and the author St-Amand, Néré becomes St-Amand, Nér.
#2
5. Apostrophes going into the taxonomy end up getting escaped so for example in this record:
http://search.cambridgelibraries.ca/record=i1062050&searchscope=0
Children's secrets becomes Children\'s secrets.
#3
For starters: ¿can you send me a screengrab or copied text from your the resulting imported node in #2?
Thanks for the issue report... I'd appreciate if each was issued separately, though =) (I will probably end up splitting these in different issues)
No. 1 and 3 look easy to fix.
No. 2 seems that I'm not corectly interpreting the kind of material that record is in order to properly select what "biblio type" to put it in. Also, there is a possibility that the Biblio module has NO support for an "illustrator" field, which would mean that I would have to figure out a lot of cases like these beforehand to make the module as easy to "plug-and-play" as possible... or choose to let libraries configure the heck out of it to cover these cases... I will look at it =)
No. 4 and 5 are related to code that cleans up strings as fetched from the MARC record, and before importing taxonomy...
#4
Fix for #4: use PHP PCRE's \p{L} along with /u (work in UTF mode) to recognize diacritics as letters
function millennium_trim_marc_value($value) {$newvalue=$value;
$newvalue=preg_replace('/[^\p{L}0-9")\?!-]+ *$/u', "", $newvalue);
$newvalue=preg_replace('/^ *[^\p{L}0-9"(-]+/u', "", $newvalue);
return $newvalue;
}
Via : http://www.phpwact.org/php/i18n/utf-8
#5
Fix for 5. Apostrophes going into the taxonomy end up getting escaped so for example in this record
db_query("INSERT INTO {term_data} (tid, name, vid) VALUES (%d, '%s', %d)", $tid, str_replace("'","\'", $term), $vid);becomes
db_query("INSERT INTO {term_data} (tid, name, vid) VALUES (%d, '%s', %d)", $tid, $term, $vid);... forgot Drupal handles all the quote escaping itself =)
#6
I sent you that screengrab.
Your fixes for issues 4 and 5 work great!
#7
Biblio maps the secondary author to "series editor"; this is not really the Biblio module's fault as it is meant for manual keying of records... but it makes things harder for us doing batch imports =) I will look at how the MARC module does this (it's supposed to also import MARC into biblio type nodes).
#8
Fixes for (4) and (5) are now in DEV, which BTW seems pretty stable by now =)
Still attepting to fix (1) (2) and (3).
#9
Regarding #2 (additional authors listed as Series Editor), I found that this label was in a record in the table biblio_type_details and was able to change it to "Other Authors". The other side of this issue remains that the authors imported into that field are not being mapped to taxonomy terms.
#10
Fixed problem (1) in DEV.
#11
#1 fixed, some subfields other than |a are being taken into account when translating into taxonomy terms.
#2 fixed: now all authors and contributors (translators, illustrators, etc.) are appended in the "Author" field.
While I know it is desirable to have separate fields, keep in mind that the biblio module (what we are building on) is meant to harbor a limited set of fields meant to cite material, not fully describe it as in a MARC record. Hence this (crass?) oversimplification.
I am transferring #3 to another issue, to finally close this issue! =)
#12
Automatically closed -- issue fixed for two weeks with no activity.