configurable encoding patch

fr - June 4, 2008 - 13:01
Project:vCard
Version:5.x-1.1
Component:Code
Category:task
Priority:normal
Assigned:fr
Status:closed
Description

I noticed that Outlook does not correctly display umlauts because of lacking UTF8 support.
This patch makes drupal export a ISO-8859-1 encoded vCard, which works around this problem.

AttachmentSize
vcard-utf8-outlook.patch486 bytes

#1

sanduhrs - June 4, 2008 - 13:12

What about characters that are not defined in ISO-8859-1 ?

#2

sanduhrs - June 4, 2008 - 14:11
Category:bug report» task
Status:fixed» needs work

From my point of view this rather is a bug in outlook than in vcard.
So what will happen when korean characters like 의 계정 세부사항 are included in a vcard?

#3

fr - June 4, 2008 - 14:56
Category:task» bug report
Status:needs work» fixed

Well, other characters won't work anymore? As I said, it's meant to be a workaround.
Of yourse, the bug is in outlook. But imagine, the situation is just, the customer uses outlook and wants a solution.
I guess, you don't want to tell him, that he's the problem.

I suggest, this mainly affects people outside of europe, which would have to replace ISO-8859-1 to their local encoding.
Another generic solution could be to implement Quoted Printable encoding. How about that?

#4

sanduhrs - June 5, 2008 - 10:43
Category:bug report» task
Status:fixed» needs work

Sorry, as long as we didn't came to an agreement, this issue is not fixed.
And it is not a bug just because Microsoft apparently is too stupid to handle different encodings than ISO-8859-1.

Drupal is using UTF-8 all over the place, so I won't include this patch as it is now.
However, I could imagine to pipe the $vcard->fetch(); on line 198 through a themeable function, so anyone interested in limiting the characterset could do this via overwriting the function in the themes template.php

#5

fr - June 4, 2008 - 15:32
Category:task» bug report

I never said it's a drupal bug.

You did not answer my second question - I've also recommended another probably generic solution.

I think your approach would also be fine, but has the disadvantage that it requires effort on templating.

#6

sanduhrs - June 5, 2008 - 11:19
Category:bug report» task

Again, this is not a bug in vcard.
If you ask Google, this problem is well known to MS since a long time.
And yes, you would have to tell your users, that this is a bug in MS-Software.

--

According to [1] the only valid encodings since V3 of the vcard specification are 8BIT for strings and Binary for files.
Quoted Printable is available in vcard specification until V2.1.

So a possible solution is to provide additional settings on admin/settings/vcard where the admin may choose to generate V3 or V2.1 versions of vcard.
If V2.1 has been chosen one may choose to encode in Quoted-printable.
In vcard_fetch() we need to add a charset param for each element and the contents need to be encoded in Quoted-printable.

So, how to encode in Quoted-printable?
Theres no function in PHP I know of.

[1] http://www.ietf.org/rfc/rfc2426.txt

#7

fr - June 6, 2008 - 22:35

Yes, but UTF8 uses more than 8 bits, so the vCard module seems to violate the RFC.
The characters have to be encoded in ASCII.

Here you got a function for Quoted Printable encoding:

function quoted_printable_encode($input, $line_max = 76)
{
    $hex = array('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F');
    $lines = preg_split("/(?:\r\n|\r|\n)/", $input);
    $eol = "\r\n";
    $linebreak = "=0D=0A";
    $escape = "=";
    $output = "";

    for ($j=0; $j<count($lines); $j++)
    {
        $line = $lines[$j];
        $linlen = strlen($line);
        $newline = "";
        for($i=0; $i<$linlen; $i++)
        {
            $c = substr($line, $i, 1);
            $dec = ord($c);
            if ( ($dec == 32) && ($i == ($linlen - 1)) ) { // convert space at eol only
                 $c = "=20";
            }
            elseif ( ($dec == 61) || ($dec < 32 ) || ($dec > 126) ) { // always encode "\t", which is *not* required
                 $h2 = floor($dec/16); $h1 = floor($dec%16);
                 $c = $escape.$hex["$h2"].$hex["$h1"];
            }
            if ( (strlen($newline) + strlen($c)) >= $line_max ) { // CRLF is not counted
                 $output .= $newline.$escape.$eol; // soft line break; " =\r\n" is okay
                 $newline = "    ";
            }
            $newline .= $c;
        }
        $output .= $newline;
        if ($j<count($lines)-1) $output .= $linebreak;
     }
     return trim($output);
}

Again, this is not a bug in vcard.
If you ask Google, this problem is well known to MS since a long time.

Everybody knows it and that's not the point.
The question is, how can you give users a possibility to work around that without modifying the drupal core?

And yes, you would have to tell your users, that this is a bug in MS-Software.

Sure, but that doesn't help them, since they just want something that simply works.
How about being less ignorant and providing a possibility to solve this issue so that everyone is happy?

So a possible solution is to provide additional settings on admin/settings/vcard where the admin may choose to generate V3 or V2.1 versions of vcard.
If V2.1 has been chosen one may choose to encode in Quoted-printable.
In vcard_fetch() we need to add a charset param for each element and the contents need to be encoded in Quoted-printable.

Yes, this would be cool.

I am going to release a new patch in the next time, which gives the vCard module the possibility to configure the encoding in the admin menu if you can agree with this.

#8

sanduhrs - June 5, 2008 - 18:25

If you provide a valid solution, be shure It'll be incorporated.
But it mustn't reduce the possible available characters from 1.114.112 to 191 as supposed in your first post!
And there's nothing about religion in this.

By the way would you please file a bug against Microsoft Outlook [1], so that they know people want UTF-8 support in it!

[1] http://weblog.timaltman.com/archive/2006/03/22/reporting-bugs-microsoft

#9

fr - June 6, 2008 - 22:03
Title:Outlook UTF8 encoding problem (no umlauts)» configurable encoding patch
Status:needs work» fixed

I just uploaded a new patch, which makes the encoding configurable and covers common charsets (using iconv).
It defaults to UTF-8, even if this is not RFC-compliant.

I didn't implement Quoted Printable encoding and the ability of changing the vCard version yet... stay tuned.

By the way would you please file a bug against Microsoft Outlook [1], so that they know people want UTF-8 support in it!

Yes, I'll do that ;)

AttachmentSize
vcard-encoding.patch 6.61 KB

#10

Anonymous (not verified) - June 20, 2008 - 23:35
Status:fixed» closed

Automatically closed -- issue fixed for two weeks with no activity.

 
 

Drupal is a registered trademark of Dries Buytaert.