When importing titles containing non-English characters, records aren't processed correctly. During automatic crawls, empty nodes of no content type are created. When attempting to view one of these nodes, the following error message displays:

warning: Invalid argument supplied for foreach() in [base path]/modules/cck/content.module on line 1284.

During manual imports, the same titles generate an error message in Watchdog:

Batch import error: no record number given in row: array ( 'id' => '4', 'session' => '128850235873', 'data' => 'a:5:{s:10:"bib_recnum";s:8:"b1020552";s:5:"title";s:73:"XLIe Congr', )

The full title of the above record is XLIe Congrès international de droit financier et fiscal : Bruxelles, 1987, so the title is cut off at the last character before the non-English "è." See http://lucy.lls.edu/record=b1020552~S0 No node is created for the item in a manual import.

This may tie into #300315: Correctly handle diacritics for UTF-8 webopacs.

CommentFileSizeAuthor
#2 millennium-958084-2-diacritics.patch712 bytesjanusman

Comments

janusman’s picture

Assigned: Unassigned » janusman

Thanks for reporting; looking into it.

janusman’s picture

Status: Active » Needs review
StatusFileSize
new712 bytes

The problem is the UTF conversion handling, plus (it seems) some weird PHP problem where it just won't accept the title as-is (the title disappears completely when transferred from an array value into a "flat" string variable. Anyways, the attached patch could probably fix your problem.

This only happens with old (?) III OPACs that don't report the encoding in the HTTP headers. To fix this I am assuming the encoding is ISO-8859-1 EDIT: when the OPAC does not report encoding.

I tested it on these records from different opacs:

Please report back if this fixes things for you =)

tomboone’s picture

Worked for me. Thanks for the quick turnaround on this. I'd already decided to have our Systems Librarian check the encoding settings in our OPAC, but regardless of how that plays out, this seems to have fixed the problem.

janusman’s picture

Status: Needs review » Fixed

Fixed in DEV in 5.x-2.x and 6.x-2.x branches. Thanks!

Status: Fixed » Closed (fixed)

Automatically closed -- issue fixed for 2 weeks with no activity.