Support for Drupal 7 is ending on 5 January 2025—it’s time to migrate to Drupal 10! Learn about the many benefits of Drupal 10 and find migration tools in our resource center.
The function theme_biblio_format_authors in biblio.theme.inc won't shorten UTF-8 encoded first names of authors properly if they contain non-ASCII characters like umlauts, accented letters, etc.
I have a project where all data in our database is UTF-8 encoded so I am going to change all those latin-1 codes to UTF8 equivalents. Probably I will have to add more characters.
Do you have any thoughts how I can make it to share this with upstream?
Comments
Comment #1
Stefan Freudenberg CreditAttribution: Stefan Freudenberg commentedI found out that this is on the todo list, so I volunteer. Before starting I want to make sure not do duplicate effort. This is the first line of the function
theme_biblio_page_number()
:I cannot find those two files in biblio and the global variables are not defined elsewhere. Is there any chance to get those two files?
Comment #2
rjerome CreditAttribution: rjerome commentedHi Stefan,
I just checked in what I think will be a fix for this issue. As you may have guessed, I "borrowed" and adapted much of the style code from another package. In the process I inadvertently put the latin1 regex patterns in rather than the Unicode ones.
Unfortunately, some of these still aren't working, so I changed some of the code in theme_biblio_format_authors() to use drupal_substr() and str_replace() instead. If you can figure out why those regular expressions are still not working, that would be great, but I think the current workaround will suffice.
Ron.
Comment #3
Stefan Freudenberg CreditAttribution: Stefan Freudenberg commentedHi Ron!
Using
drupal_substr
for shortening the fornames does return only the first initial. I don't know why the regular expressions did not work for you because I almost did the same and it worked for me. I'll write some unit tests for the function. Would you give me the names that caused the function to fail?Stefan
Comment #4
rjerome CreditAttribution: rjerome commentedActually, it failed on ALL forenames (special characters or not). Is that not the case on your end?
Comment #5
Stefan Freudenberg CreditAttribution: Stefan Freudenberg commentedNo. I replaced the character classes with the unicode properties (you did it even more accurately than I) and I had no more problems with shortening fore names. Our database already has several thousand authors and I haven't encountered any errors yet. I am going to try your version from CVS.
Comment #6
Stefan Freudenberg CreditAttribution: Stefan Freudenberg commentedI have tested your version using the regular expressions instead of
drupal_substr
.It works for me for authors with and without non-ascii characters in their forenames.
Comment #7
rjerome CreditAttribution: rjerome commentedHmmm, I'm left scratching my head, because in theory those regex expressions should work (and do as you have proven), but in practice on my setup they do not :-( Now I need to find out what it is about my system that is preventing them from working, because I'm sure someone else is going to encounter the same issue.
Comment #8
Stefan Freudenberg CreditAttribution: Stefan Freudenberg commentedThe Unicode character properties are available since PHP versions 4.4.0 and 5.1.0: http://php.net/manual/en/regexp.reference.php#regexp.reference.unicode
It is also possible that your preg library is compiled without UTF-8 support.
Comment #9
rjerome CreditAttribution: rjerome commentedI'm running CentOS 5.3 which bundles PHP 5.1.6 what version of PHP are you using?
Comment #10
rjerome CreditAttribution: rjerome commentedYou were right, it turns out that RHEL and therefore CentOS doesn't build their PCRE libraries with Unicode properties support enabled! (https://bugzilla.redhat.com/show_bug.cgi?id=457064)
Rebuilding with Unicode support enabled solved the problem on my end too.
Talk about a waste of a day!
Ron.