uncorrect UTF filter
skotch - June 3, 2008 - 12:37
| Project: | Wordfilter |
| Version: | 5.x-1.x-dev |
| Component: | Code |
| Category: | bug report |
| Priority: | normal |
| Assigned: | Unassigned |
| Status: | closed |
Jump to:
Description
Hello!
Where are 2 problems with UTF8 (non-ascii texts and replacements)
In order to operate with utf-strings please correct function wordfilter_filter_process()
1. UPPER/LOW CASE replacement (use '/iu' insted of '/i')
2. Standalone word replacement (use '[\W]' instead of '[^a-z0-9]')
Correct code that I've tested:
if ($word->standalone) {
$text = preg_replace('/([\W])'. preg_quote($a) .'([\W])/iu', '$1'. $replacement .'$2', $text);
}
else {
$text = preg_replace('/'. preg_quote($a) .'/iu', $replacement, $text);
}
#1
can you post some sample UTF8 characters that you
are trying to work with?
I am a bit worried about what the PHP website says
about using the /u flag:
I think maybe making it an option rather than the default
might be another approach especially if the performance hit
mentioned above is large enough.
#2
Ok I've added this in as an option. Please try out and post
results of test with the UTF8 content you wish to filter
on.
#3
Automatically closed -- issue fixed for two weeks with no activity.