I observed that a node title with UTF-8 characters gets truncated in a wrong way.

Further investigation revealed this:

- the database field for the node title is 128 characters long.
- in our example (Khmer language) every character needs three character in UTF-8
- Drupal does not check the length of the title before putting it into the database

The result is that the string is truncated after 128 character that is in the midst of the 43th Khmer character.

Fortunately this is easy to fix if you change one line in node.module. There is a function node_validate() and replacing
$node->title = strip_tags($node->title);
by
$node->title = truncate_utf8(strip_tags($node->title), 128);

does solve the problem here. This also makes sure that in the preview the title is as along as it would be if we take it from the database. Because before you got the non-truncated title to see.

There are two more things to consider:

1. Is it correct that the length of the accepted title depends of the character the user is typing? Now a Latin text can have 128 characters but a Khmer only 43. This is wrong for my understanding. The solution would be to make a function that limits a string to an amount of UTF-8 characters and not bytes. And extend the database field to 4 times 128 because in the worst case an UTF-8 character can occupy 4 byte.

2. There might be some more fields in the database with limited length and user data which are not tested correctly before submitting the data to the database?

Jens

Comments

killes@www.drop.org’s picture

Version: » x.y.z
Priority: Critical » Normal

Still valid.

Jose Reyero’s picture

I think this is a complex thing, an the only true solution for this would be to define mysql fields as UTF-8, which for now presents some compatibility problems.

So, for now, I'd just update the node table.

ALTER TABLE `node` MODIFY COLUMN `title` VARCHAR(255) NOT NULL;

Steve Simms’s picture

MySQL fields are now UTF-8, right? Does that solve this problem?

magico’s picture

Status: Active » Fixed

Following smsimms this should be fixed by now.

Anonymous’s picture

Status: Fixed » Closed (fixed)