Posted by Arto on October 15, 2006 at 3:54pm
3 followers
Jump to:
| Project: | Accents |
| Version: | master |
| Component: | Code |
| Category: | feature request |
| Priority: | minor |
| Assigned: | Unassigned |
| Status: | closed (fixed) |
Issue Summary
This is not really a feature request, but rather just a heads up for a possibly simpler method for downgrading accented characters into their non-accented versions.
Specifically, I'm referring to the unaccent() function from http://bendiken.net/snippets/php
<?php
function unaccent($text) {
static $search, $replace;
if (!$search) {
$search = $replace = array();
// Get the HTML entities table into an array
$trans = get_html_translation_table(HTML_ENTITIES);
// Go through the entity mappings one-by-one
foreach ($trans as $literal => $entity) {
// Make sure we don't process any other characters
// such as fractions, quotes etc:
if (ord($literal) >= 192) {
// Get the accented form of the letter
$search[] = $literal;
// Get e.g. 'E' from the string 'É'
$replace[] = $entity[1];
}
}
}
return str_replace($search, $replace, $text);
}
?>Not sure if this method can be of use to you, but wanted to ensure awareness of its existence.
Comments
#1
Quick note. get_html_translation_table returns the literals in ISO-8859-1 encoding, which recently caused me some problems on a utf-8 system. The fix is super-easy, though: just replace
$search[] = $literal;with
$search[] = utf8_encode($literal);#2