Shortening forename of authors containing UTF-8 special characters does not work [#433198]

Comment #1

stefan freudenberg commented 15 April 2009 at 14:28

Category:

bug

» task

I found out that this is on the todo list, so I volunteer. Before starting I want to make sure not do duplicate effort. This is the first line of the function theme_biblio_page_number():

global $alnum, $alpha, $cntrl, $dash, $digit, $graph, $lower, $print, $punct, $space, $upper, $word, $patternModifiers; // defined in 'transtab_unicode_charset.inc.php' and 'transtab_latin1_charset.inc.php'

I cannot find those two files in biblio and the global variables are not defined elsewhere. Is there any chance to get those two files?

Log in or register to post comments

Comment #2

rjerome commented 15 April 2009 at 18:05

Status:

Active

» Fixed

Hi Stefan,

I just checked in what I think will be a fix for this issue. As you may have guessed, I "borrowed" and adapted much of the style code from another package. In the process I inadvertently put the latin1 regex patterns in rather than the Unicode ones.

Unfortunately, some of these still aren't working, so I changed some of the code in theme_biblio_format_authors() to use drupal_substr() and str_replace() instead. If you can figure out why those regular expressions are still not working, that would be great, but I think the current workaround will suffice.

Ron.

Log in or register to post comments

Comment #3

stefan freudenberg commented 16 April 2009 at 12:32

Hi Ron!

Using drupal_substr for shortening the fornames does return only the first initial. I don't know why the regular expressions did not work for you because I almost did the same and it worked for me. I'll write some unit tests for the function. Would you give me the names that caused the function to fail?

Stefan

Log in or register to post comments

Comment #4

rjerome commented 16 April 2009 at 13:21

Actually, it failed on ALL forenames (special characters or not). Is that not the case on your end?

Log in or register to post comments

Comment #5

stefan freudenberg commented 16 April 2009 at 14:08

No. I replaced the character classes with the unicode properties (you did it even more accurately than I) and I had no more problems with shortening fore names. Our database already has several thousand authors and I haven't encountered any errors yet. I am going to try your version from CVS.

Log in or register to post comments

Comment #6

stefan freudenberg commented 16 April 2009 at 14:21

I have tested your version using the regular expressions instead of drupal_substr.

if (!empty($author['firstname'])) {
      if ($options['shortenGivenNames']) // if we're supposed to abbreviate given names
      {
        // within initials, reduce all full first names (-> defined by a starting uppercase character, followed by one ore more lowercase characters)
        // to initials, i.e., only retain their first character

        $author['firstname'] = preg_replace("/([$upper])[$lower]+/$patternModifiers", "\\1", $author['firstname']);
        //$author['firstname'] = drupal_substr($author['firstname'], 0, 1);
      }
    }

It works for me for authors with and without non-ascii characters in their forenames.

Log in or register to post comments

Comment #7

rjerome commented 16 April 2009 at 14:28

Hmmm, I'm left scratching my head, because in theory those regex expressions should work (and do as you have proven), but in practice on my setup they do not :-( Now I need to find out what it is about my system that is preventing them from working, because I'm sure someone else is going to encounter the same issue.

Log in or register to post comments

Comment #8

stefan freudenberg commented 16 April 2009 at 15:18

The Unicode character properties are available since PHP versions 4.4.0 and 5.1.0: http://php.net/manual/en/regexp.reference.php#regexp.reference.unicode
It is also possible that your preg library is compiled without UTF-8 support.

Log in or register to post comments

Comment #9

rjerome commented 16 April 2009 at 15:43

I'm running CentOS 5.3 which bundles PHP 5.1.6 what version of PHP are you using?

Log in or register to post comments

Comment #10

rjerome commented 16 April 2009 at 16:48

You were right, it turns out that RHEL and therefore CentOS doesn't build their PCRE libraries with Unicode properties support enabled! (https://bugzilla.redhat.com/show_bug.cgi?id=457064)

Rebuilding with Unicode support enabled solved the problem on my end too.

Talk about a waste of a day!

Ron.

Log in or register to post comments

Comment #11

30 April 2009 at 16:50

Status:	Fixed	» Closed (fixed)
Issue tags:	-utf-8

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Comment #12

30 April 2009 at 16:50

Issue tags:

+utf-8

Restoring issue tags, see #2125755: System messages removed all issue tags during D7 upgrade.

Log in or register to post comments

Shortening forename of authors containing UTF-8 special characters does not work

Comments