Closed (fixed)
Project:
Bibliography Module
Version:
7.x-1.x-dev
Component:
Code
Priority:
Normal
Category:
Bug report
Assigned:
Unassigned
Reporter:
Created:
19 Oct 2010 at 17:24 UTC
Updated:
3 Nov 2010 at 02:20 UTC
Jump to comment: Most recent file
Comments
Comment #1
rjerome commentedHmm, I can't reproduce that (see image attached), what style are you using?
Comment #2
scor commentedI'm using the out of the box tabular style, browsing to the node full view or the listing shows the bug. See attached screenshot. I was able to reproduce on another machine running a different config (one is localhost Mac OS, the server is debian).
Comment #3
rjerome commentedWhen I asked about style I was referring to "AMA, APA, MLA" etc.. It may be related to that choice.
Comment #4
scor commentedIt's just a vanilla installation of the latest Drupal 7 and the latest biblio-7.x-1.x-dev without any settings changed. Turns out the default style is CSE, but the problem is the same with AMA, APA too. The weird initial is introduced by theme_biblio_format_authors() at the line:
I have no other modules running on this site.
Comment #5
rjerome commentedAhh, this is probably related to the PCRE library on your webserver I ran into this before.
You will see in biblio_theme.inc at line 403, a funciton called _biblio_get_regex_patterns() which does a test of the PCRE library and decides whether to use it or not.
You might have to change line
332341 and add "Ć" to it.Ron.
Comment #6
scor commentedThe line 341 does not get executed, my localhost uses _biblio_get_utf8_regex()
Comment #7
rjerome commentedThis appears to be a PHP 5.3 issue, I just tried it on another machine running 5.3 and now I'm seeing the same behavior you see.
I'll have to dig a bit further...
Comment #8
scor commentedThe thing is, I run PHP 5.2.11 :)
Here is what my phpinfo says about PCRE:
pcre
PCRE (Perl Compatible Regular Expressions) Support enabled
PCRE Library Version 7.9 2009-04-11
Directive Local Value Master Value
pcre.backtrack_limit 100000 100000
pcre.recursion_limit 100000 100000
Comment #9
rjerome commentedOK I tracked it down... It has more to do with the way the characters in the XML file were encoded then with PCRE itself. The é character was encoded as two unicode characters "U+0065 U+0103" as opposed to a single unicode charater "U+00E9". The "stand alone" accent character was messing up the expression because it was neither an upper or lower case letter.
Bottom line, adding "\p{M}" to line 378 like this
$lower = "\p{Ll}\p{M}";fixes the problem.See http://www.regular-expressions.info/unicode.html for all the gory details.
Ron.
Comment #10
scor commentedGreat! A side effect of this is that importing another paper from the same person like http://www.ncbi.nlm.nih.gov/pubmed/18954984 leads to the creation of a duplicate contributor with the same name. I guess that would be a separate issue though - if only PHP had Unicode normalization, it would help a lot.
Comment #11
rjerome commentedThis is where the author "merge" function comes in handy, but strangely, I can't even import that last one you mention (18954984) it returns no data.
http://drupal.org/cvs?commit=438868
Comment #12
rjerome commentedscratch that, the third time I tried 18954984 it worked, maybe a wonky network connection.