Add support for sites lacking full unicode multibyte support
ianchan - September 30, 2008 - 22:42
| Project: | Millennium Integration |
| Version: | 5.x-1.4 |
| Component: | Code |
| Category: | support request |
| Priority: | minor |
| Assigned: | janusman |
| Status: | active |
Jump to:
Description
Does this mean we have a an outdated version of PCRE?
Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at offset 6 in .... /all/modules/millennium/millennium.module on line 996
Is there any way around it?

#1
Can you please specify your full PHP version and compile options? It would be the first part of the output from the PHP phpinfo() function...
#2
You're right - there is a problem with our PCRE installation. pcretest -C shows that unicode properties support is not enabled. Is there any way to modify the script to work without unicode? I can guess the answer but thought I'd ask.
#3
Looked into this, and while you really really should have PCRE + Unicode support (Millennium and/or Drupal speak Unicode) you can PROBABLY use this replacement function to bypass your error.
Just replace the current function with this one... tell me if it works!
function millennium_trim_marc_value($value) {
global $multibyte;
$newvalue = $value;
if ($multibyte == UNICODE_MULTIBYTE) {
$newvalue = preg_replace('/[^\p{L}0-9")\?!-]+ *$/u', "", $newvalue); // \p{L} => handle extended charset
$newvalue = preg_replace('/^ *[^\p{L}0-9"(-]+/u', "", $newvalue);
} else {
// Fallback to non-unicode handling
$newvalue = preg_replace('/[^a-z0-9")\?!-]+ *$/i', "", $newvalue);
$newvalue = preg_replace('/^ *[^a-z0-9"(-]+/i', "", $newvalue);
}
$newvalue = preg_replace('/\((.*)\)$/', "\\1", $newvalue);
return $newvalue;
}
#4
I tested it a bit and it seems to work, at least on my full unicode and "new" PCRE installation. Since it seems code-safe (although perhaps not very wise to bypass this requirement) I went ahead and commited to 6.x-2.x-DEV. You can fetch that new version starting tonight (wait till Drupal.org gets a chance to repackage from CVS).
#5
Thank you for implementing the fix. Unfortunately it did not work for us without some modification. However, the issue is with our system setup and not with the module.
FYI - here's what my hack of #3 for our system:
function millennium_trim_marc_value($value) {
global $multibyte;
$newvalue = $value;
// Fallback to non-unicode handling
$newvalue = preg_replace('/[^a-z0-9")\?!-]+ *$/i', "", $newvalue);
$newvalue = preg_replace('/^ *[^a-z0-9"(-]+/i', "", $newvalue);
$newvalue = preg_replace('/\((.*)\)$/', "\\1", $newvalue);
return $newvalue;
}
#6
Thanks for checking this. I asked around in #drupal (irc) and the suggestion was to use some of the code from the core Search module, which is independent on the PCRE libraries... as I understand it is nearly impossible to detect what PCRE options the installed PCRE library has (which in your case, I assume, is missing some options that mine and others' have). Search.module bypasses this using a long list of multibyte values to check against =)
Glad you could circumvent it for now; again, I would not depend on that code being 100% correct, though, since it WILL drop some characters with diacritics in them (like "Motley Crüe", "The effects of el Niño"); well, at least at the beginning and end of strings.