PostgreSQL and UTF-8

AlexisWilke - May 24, 2009 - 07:51
Project:Webmail Plus
Version:6.x-1.16
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active
Description

I finally got the imap_open() to work! Yoopi! 8-)

Now, it loads the emails just fine, although it loads them on each click on my site. Crazy. I will into that later, but for now I have this one:

# warning: pg_query() [function.pg-query]: Query failed: ERROR: invalid byte sequence for encoding "UTF8": 0xa0 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". in /usr/clients/www/drupal/includes/database.pgsql.inc on line 139.
# user warning: in /usr/clients/www/drupal/sites/all/modules/webmail_plus/webmail_plus.module on line 1376.

I'm not too sure how to go about it. But it is very likely that you currently totally ignore the encoding information of the email and thus do not convert it to UTF-8 if necessary. Also, that means we need to sanitize the data before we put it in the database. The system takes care of the bad characters (i.e. usually new-lines and quotes) in the resulting SQL, but not of character encoding.

If you have an idea on how to fix this, let me know. At this time, I'm trying to get emails to appear. So far no luck even though I have nearly 1,500 in the database.

#1

AlexisWilke - May 25, 2009 - 23:02

Hi guys,

There is actually an answer on the PHP website: http://us3.php.net/manual/en/function.imap-mime-header-decode.php

Scroll down to this message:

s dot wiese at trabia dot md
18-Dec-2007 09:46

or just use the following code (which is a copy of what's available on the PHP website.) Note that what they do is convert the subject to a set of text/encoding pairs, then they transform that to one encoding (i.e. UTF-8 by default, for Drupal, you should use the encoding the user chose for his database).

<?php
//return supported encodings in lowercase.
function mb_list_lowerencodings() { $r=mb_list_encodings();
  for (
$n=sizeOf($r); $n--; ) { $r[$n]=strtolower($r[$n]); } return $r;
}

//  Receive a string with a mail header and returns it
// decoded to a specified charset.
// If the charset specified into a piece of text from header
// isn't supported by "mb", the "fallbackCharset" will be
// used to try to decode it.
function decodeMimeString($mimeStr, $inputCharset='utf-8', $targetCharset='utf-8', $fallbackCharset='iso-8859-1') {
 
$encodings=mb_list_lowerencodings();
 
$inputCharset=strtolower($inputCharset);
 
$targetCharset=strtolower($targetCharset);
 
$fallbackCharset=strtolower($fallbackCharset);

 
$decodedStr='';
 
$mimeStrs=imap_mime_header_decode($mimeStr);
  for (
$n=sizeOf($mimeStrs), $i=0; $i<$n; $i++) {
   
$mimeStr=$mimeStrs[$i];
   
$mimeStr->charset=strtolower($mimeStr->charset);
    if ((
$mimeStr == 'default' && $inputCharset == $targetCharset)
      ||
$mimStr->charset == $targetCharset) {
     
$decodedStr.=$mimStr->text;
    } else {
     
$decodedStr.=mb_convert_encoding(
       
$mimeStr->text, $targetCharset,
        (
in_array($mimeStr->charset, $encodings) ?
         
$mimeStr->charset : $fallbackCharset)
        )
      );
    }
  } return
$decodedStr;
}
?>

 
 

Drupal is a registered trademark of Dries Buytaert.