Download & Extend

Alternate unaccent function implementation

Project:Accents
Version:master
Component:Code
Category:feature request
Priority:minor
Assigned:Unassigned
Status:closed (fixed)

Issue Summary

This is not really a feature request, but rather just a heads up for a possibly simpler method for downgrading accented characters into their non-accented versions.

Specifically, I'm referring to the unaccent() function from http://bendiken.net/snippets/php

<?php
function unaccent($text) {
  static
$search, $replace;
  if (!
$search) {
   
$search = $replace = array();
   
// Get the HTML entities table into an array
   
$trans = get_html_translation_table(HTML_ENTITIES);
   
// Go through the entity mappings one-by-one
   
foreach ($trans as $literal => $entity) {
     
// Make sure we don't process any other characters
      // such as fractions, quotes etc:
     
if (ord($literal) >= 192) {
       
// Get the accented form of the letter
       
$search[] = $literal;
       
// Get e.g. 'E' from the string '&Eacute'
       
$replace[] = $entity[1];
      }
    }
  }
  return
str_replace($search, $replace, $text);
}
?>

Not sure if this method can be of use to you, but wanted to ensure awareness of its existence.

Comments

#1

Quick note. get_html_translation_table returns the literals in ISO-8859-1 encoding, which recently caused me some problems on a utf-8 system. The fix is super-easy, though: just replace
$search[] = $literal;
with
$search[] = utf8_encode($literal);

#2

Status:active» closed (fixed)