Character Problem with nodes that have greek titles

Chrys - July 16, 2009 - 19:55
Project:Drigg
Version:6.x-1.x-dev
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:needs work
Description

I have created a drigg website but I have a problem with Greek characters. Some titles come scrambled (example). The weird part is that some nodes from the same source are scrambled and some are not. Also note that this feed item is not scrambled in the feed agreegator page.

Useful Information
Drupal version: 6.10
MySQL: 4.1.22
Collation: utf8_general_ci
PHP: 5.2.5
PHP memory limit: 32M

#1

drupallogic - July 17, 2009 - 00:14

Drigg has issues with Non-English characters.

I hope Merc will fix it in next release.

#2

Chrys - July 17, 2009 - 07:02

Well, this website mysuba.ru seems to handle non-english characters just fine. I contacted the owner of the website to see if he faced similar problem.

It is difficult to debug this because some feed items come fine and some feed items come with question marks. Also all the feed items in the aggregator page are all ok (aggregator_item table). So I guess the problem is where the title is copied from the {aggregator_item} table to the {node} table.

I will continue searching for this bug. Kudos to merc and the other guys that work on this module. Very well structured and very well commented.

#3

cedricfontaine - July 20, 2009 - 18:37

Are the items coming all from the same form ? Can you find a reason why some are ok ? From a special user ? From a special form ? Grabbed From a rss feed ?

#4

Chrys - July 21, 2009 - 14:01

The items come from different feed sources. For example this node is broken and this node is not broken. However they come from the same feed source!
What I discovered so far:
- Aggregator module: All the feed items in the aggregator table {aggregator_item} table are ok. So the aggregator module is ok.
- Database: aggregator_item table and node table have the same collation (utf8_general_ci).

#5

Chrys - August 2, 2009 - 19:15

Still working on this issue. I added log information and the scrambled items appear just after this line in drigg/drigg_rss/drigg_rss.module:

$result = db_query("SELECT * from {aggregator_item} WHERE iid> %d", $last_iid);
while ($item =  db_fetch_object($result)) {
...
log_debug("TITLE: $item->title");

The strange thing is that the titles in the table aggregator_item are fine when I find them with phpMyAdmin. So why db_query() or db_fetch_object() don't understand Greek characters (sometimes) and bring item titles with question marks?

A broken link example is here

#6

Chrys - August 13, 2009 - 13:14
Status:active» fixed

Fixed. I did two changes so I am not sure which one did the work :)

The changes I made were the following:

- Before connecting to the database I added these two commands:

db_query("SET CHARACTER SET 'utf8'");
db_query("SET NAMES 'utf8'");

So that I am sure that UFT8 is used.

- Changed utf_8_general_cι collation to utf_unicode_cι collation.

#7

Chrys - August 13, 2009 - 13:15

Fixed. I did two changes so I am not sure which one did the work :)

The changes I made were the following:

- Before connecting to the database I added these two commands:

db_query("SET CHARACTER SET 'utf8'");
db_query("SET NAMES 'utf8'");

So that I am sure that UFT8 is used.

- Changed utf_8_general_cι collation to utf_unicode_cι collation.

#8

ajayg - August 17, 2009 - 15:44
Status:fixed» needs work

I saw the issue marked as fixed. But neither see any patch not see any code changes made to cvs code.

 
 

Drupal is a registered trademark of Dries Buytaert.