Download & Extend

Add support for sites lacking full unicode multibyte support

Project:Millennium OPAC Integration
Version:5.x-1.4
Component:Code
Category:support request
Priority:minor
Assigned:janusman
Status:closed (won't fix)

Issue Summary

Does this mean we have a an outdated version of PCRE?

Compilation failed: PCRE does not support \L, \l, \N, \P, \p, \U, \u, or \X at offset 6 in .... /all/modules/millennium/millennium.module on line 996

Is there any way around it?

Comments

#1

Status:active» postponed (maintainer needs more info)

Can you please specify your full PHP version and compile options? It would be the first part of the output from the PHP phpinfo() function...

#2

You're right - there is a problem with our PCRE installation. pcretest -C shows that unicode properties support is not enabled. Is there any way to modify the script to work without unicode? I can guess the answer but thought I'd ask.

#3

Assigned to:Anonymous» janusman
Status:postponed (maintainer needs more info)» needs review

Looked into this, and while you really really should have PCRE + Unicode support (Millennium and/or Drupal speak Unicode) you can PROBABLY use this replacement function to bypass your error.

Just replace the current function with this one... tell me if it works!

function millennium_trim_marc_value($value) {
  global $multibyte;

  $newvalue = $value;
  if ($multibyte == UNICODE_MULTIBYTE) {
    $newvalue = preg_replace('/[^\p{L}0-9")\?!-]+ *$/u', "", $newvalue); // \p{L} => handle extended charset
    $newvalue = preg_replace('/^ *[^\p{L}0-9"(-]+/u', "", $newvalue);
  } else {
    // Fallback to non-unicode handling
    $newvalue = preg_replace('/[^a-z0-9")\?!-]+ *$/i', "", $newvalue);
    $newvalue = preg_replace('/^ *[^a-z0-9"(-]+/i', "", $newvalue);
  }
  $newvalue = preg_replace('/\((.*)\)$/', "\\1", $newvalue);
  return $newvalue;
}

#4

Title:PCRE Problems» Add support for sites lacking full unicode multibyte support
Status:needs review» fixed

I tested it a bit and it seems to work, at least on my full unicode and "new" PCRE installation. Since it seems code-safe (although perhaps not very wise to bypass this requirement) I went ahead and commited to 6.x-2.x-DEV. You can fetch that new version starting tonight (wait till Drupal.org gets a chance to repackage from CVS).

#5

Thank you for implementing the fix. Unfortunately it did not work for us without some modification. However, the issue is with our system setup and not with the module.

FYI - here's what my hack of #3 for our system:

function millennium_trim_marc_value($value) {
  global $multibyte;

  $newvalue = $value;

    // Fallback to non-unicode handling
    $newvalue = preg_replace('/[^a-z0-9")\?!-]+ *$/i', "", $newvalue);
    $newvalue = preg_replace('/^ *[^a-z0-9"(-]+/i', "", $newvalue);

  $newvalue = preg_replace('/\((.*)\)$/', "\\1", $newvalue);
  return $newvalue;
}

#6

Priority:normal» minor
Status:fixed» active

Thanks for checking this. I asked around in #drupal (irc) and the suggestion was to use some of the code from the core Search module, which is independent on the PCRE libraries... as I understand it is nearly impossible to detect what PCRE options the installed PCRE library has (which in your case, I assume, is missing some options that mine and others' have). Search.module bypasses this using a long list of multibyte values to check against =)

Glad you could circumvent it for now; again, I would not depend on that code being 100% correct, though, since it WILL drop some characters with diacritics in them (like "Motley Crüe", "The effects of el Niño"); well, at least at the beginning and end of strings.

#7

Currently looks like this:

<?php
function millennium_trim_marc_value($value) {
 
$newvalue = $value;
 
$newvalue = trim($value);
 
$newvalue = preg_replace('/[\.,\/:;]+$/', '', $newvalue);
  return
$newvalue;
}
?>

Perhaps I need to take a page from #768040: truncate_utf8() only works for latin languages (and drupal_substr has a bug) and truncate ending word-boundary and other characters, like so?

<?php
$newvalue
= preg_replace('/[' . PREG_CLASS_UNICODE_WORD_BOUNDARY . ']+$/u', '', $newvalue );
?>

#8

Status:active» closed (won't fix)

Closing out 1.x branch.

nobody click here