Add support for sites lacking full unicode multibyte support

ianchan - September 30, 2008 - 22:42
Project:Millennium Integration
Version:5.x-1.4
Component:Code
Category:support request
Priority:minor
Assigned:janusman
Status:active
Description

Does this mean we have a an outdated version of PCRE?

Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at offset 6 in .... /all/modules/millennium/millennium.module on line 996

Is there any way around it?

#1

janusman - November 18, 2008 - 22:19
Status:active» postponed (maintainer needs more info)

Can you please specify your full PHP version and compile options? It would be the first part of the output from the PHP phpinfo() function...

#2

ianchan - November 19, 2008 - 04:11

You're right - there is a problem with our PCRE installation. pcretest -C shows that unicode properties support is not enabled. Is there any way to modify the script to work without unicode? I can guess the answer but thought I'd ask.

#3

janusman - November 19, 2008 - 15:39
Assigned to:Anonymous» janusman
Status:postponed (maintainer needs more info)» needs review

Looked into this, and while you really really should have PCRE + Unicode support (Millennium and/or Drupal speak Unicode) you can PROBABLY use this replacement function to bypass your error.

Just replace the current function with this one... tell me if it works!

function millennium_trim_marc_value($value) {
  global $multibyte;

  $newvalue = $value;
  if ($multibyte == UNICODE_MULTIBYTE) {
    $newvalue = preg_replace('/[^\p{L}0-9")\?!-]+ *$/u', "", $newvalue); // \p{L} => handle extended charset
    $newvalue = preg_replace('/^ *[^\p{L}0-9"(-]+/u', "", $newvalue);
  } else {
    // Fallback to non-unicode handling
    $newvalue = preg_replace('/[^a-z0-9")\?!-]+ *$/i', "", $newvalue);
    $newvalue = preg_replace('/^ *[^a-z0-9"(-]+/i', "", $newvalue);
  }
  $newvalue = preg_replace('/\((.*)\)$/', "\\1", $newvalue);
  return $newvalue;
}

#4

janusman - November 19, 2008 - 16:20
Title:PCRE Problems» Add support for sites lacking full unicode multibyte support
Status:needs review» fixed

I tested it a bit and it seems to work, at least on my full unicode and "new" PCRE installation. Since it seems code-safe (although perhaps not very wise to bypass this requirement) I went ahead and commited to 6.x-2.x-DEV. You can fetch that new version starting tonight (wait till Drupal.org gets a chance to repackage from CVS).

#5

ianchan - November 20, 2008 - 01:43

Thank you for implementing the fix. Unfortunately it did not work for us without some modification. However, the issue is with our system setup and not with the module.

FYI - here's what my hack of #3 for our system:

function millennium_trim_marc_value($value) {
  global $multibyte;

  $newvalue = $value;

    // Fallback to non-unicode handling
    $newvalue = preg_replace('/[^a-z0-9")\?!-]+ *$/i', "", $newvalue);
    $newvalue = preg_replace('/^ *[^a-z0-9"(-]+/i', "", $newvalue);

  $newvalue = preg_replace('/\((.*)\)$/', "\\1", $newvalue);
  return $newvalue;
}

#6

janusman - November 20, 2008 - 14:36
Priority:normal» minor
Status:fixed» active

Thanks for checking this. I asked around in #drupal (irc) and the suggestion was to use some of the code from the core Search module, which is independent on the PCRE libraries... as I understand it is nearly impossible to detect what PCRE options the installed PCRE library has (which in your case, I assume, is missing some options that mine and others' have). Search.module bypasses this using a long list of multibyte values to check against =)

Glad you could circumvent it for now; again, I would not depend on that code being 100% correct, though, since it WILL drop some characters with diacritics in them (like "Motley Crüe", "The effects of el Niño"); well, at least at the beginning and end of strings.

 
 

Drupal is a registered trademark of Dries Buytaert.