We've created an attachment to one of our page views that uses the glossary mode. As you can see in the attached picture that there is a character missing in the attachment. Sometimes the character that shows is Å and sometimes it's just A. Any ideas about this issue?
| Comment | File | Size | Author |
|---|---|---|---|
| Picture 1.png | 36.69 KB | snowball43 |
Comments
Comment #1
merlinofchaos commentedI believe that the collation on the database makes the A and Å appear to be the same character to SQL, even though they are not the same character to PHP. I'm not actually sure how to deal with issues like this.
Comment #2
snowball43 commentedSo the generating of the glossary mode is done via SQL?
Comment #3
merlinofchaos commentedYes, it uses a SQL SUBSTR command on the field, which I assume is the node.title field in this case.
Comment #4
karens commentedThis is just a shot in the dark, totally untested, but I ran into an issue like this on translated dates and it was the LENGTH that was the problem, not the SUBSTR. I took a quick look at the MYSQL manual and it looks like SUBSTR is multi-byte safe but LENGTH is not. You may be able to swap CHAR_LENGTH instead into the SQL to see if that fixes it.
Comment #5
snowball43 commentedThe problem appears that the GROUP BY doesn't recognize the difference in the characters. So if we can group by something that is more specific such as ORD.
I've tested this a little by copying the query presented in the preview section of the edit view screen and tweaking it and running the query from Navicat. I added a field to the SELECT using ORD on the field to group by and changing the GROUP BY value to the name specified for the ORD field.
Note: I'm using MySQL, and I'm not sure if ORD is a globally acceptable SQL function.
Comment #6
wojtha commentedI had same issue with Czech letters (ĚŠČŘŽÝÁÍÉŮ etc). There is a solution: you need to change table column comparison function in your database.
In MySQL & Drupal are comparison function of text columns set to "utf8_general_ci" by default. When I changed it to our language specific function "utf8_czech_ci", all (present) variants of letters appeared. Binary comparison function - utf8_bin - is maybe universal solution for that, but you loose language specific sorting.
Comment #7
esmerel commentedThis seems like something that should go into documentation, or request some help from the internationalization teams to see if anyone there has a good idea for dealing with this.
Comment #8
esmerel commentedAdding tag
Comment #9
bojanz commentedutf8_general_ci sucks for many languages.
Cyrilic or arabian are pretty broken, for example. In those cases, utf8_unicode_ci is used.
utf8_unicode_ci supports everything, but is slower.
Comment #10
iamjon commentedClosing from a lack of activity.Please feel free to reopen.