Greetings,

I'm working on an English-language site, but in a few of the articles we need to quote a few words of Greek. The Greek text I have is from a unicode font, and I can paste it into any application on my computer, change the font, etc. and it always shows up as the right characters.

When I put it in the "body" field in a node, it is converted to all question marks. This happens regardless of input filter, whether filtered or full HTML. When I go back to the node editing form, the text has permanently turned into question marks.

HOWEVER

When I paste that same Greek text into one of my custom CCK fields, the Greek comes up just fine in the final product, and this also seems to be unrelated to the input format specified.

Does anyone have any idea how the text processing in the body field might be different from that of a CCK field? Any ideas on how I might fix this problem. I'd rather not add a replacement body field with CCK, since that would involve converting a lot of nodes...

Comments

lhtown’s picture

First off, this should work flawlessly and painlessly.

Assuming you are using the same theme to view all pages and assuming you are viewing all pages on the same browser (you might check to see of the browser somehow uses different encoding on different pages on your site, which would probably be a theme issue), it sounds like there is some kind of filtering going on with you nodes that doesn't happen with the cck fields.

To test this, I would suggest entering the greek text as part of the title. If that works, I would suspect that you have a WYSIWYG editor that is filtering your code and goofing things up for youin which case, there is likely a configuration change you can make in it to prevent it from changing your utf characters to escaped character codes.

If that doesn't work, you should look at any modules that are part of your input filtering. This should work with Full html or filtered html, but you may have a module that tries to substitute characters or clean things up or do other stuff.

Beyond that, it would probably be possible to do what you are talking about with javascript, but unless you have been monkeying around with javascript or a module that uses javascript, you should be safe. Even at that, it sounds pretty far out.

crbassett’s picture

Hello, and thanks for responding.

Here's what I've done for further testing:

  1. Tested entering Greek text in both Safari and Firefox/Mac. I got question marks in both.
  2. Entered sample Greek text in the title field and got question marks.
  3. Entered sample Greek text in the title and body field of the default "Page" content type, no CCK, no Contemplate. Got question marks.
  4. Tested entering sample Greek text using both the "Filtered HTML" and "Full HTML" input filters. Got question marks in the body field. I'm not sure that input filters have much to do with it, since those filters are also applied to my CCK fields in which the Greek text works.

I have a Drupal installation on a different server...I'll test this out there just to make further narrow down what the problem might be.

crbassett’s picture

More results:

  1. Tested Greek text in titles and body fields on a separate Drupal 5.3 website on a different server. It works.
  2. Tested Greek text in titles and body fields on a separate Drupal 5.6 website on the same server as the one I'm working on. It works.
  3. On the site I'm having the problem with, I have the following contrib modules enabled: Footnotes, CCK, Views, Contemplate, Pathauto (and Token). Compared to the first site (5.3) listed above, the only difference is that the first site doesn't use the Footnotes module. Otherwise, CCK, Views, Contemplate, Pathauto (and Token) were all enabled on the 5.3 site which displayed the proper behavior. So, I disabled the Footnotes module and that didn't fix the problem for me.

I guess the next thing to check, as far as I can think of, is to upgrade those two other sites to 5.7 and see if the problem comes into play once 5.7 comes on the scene.

lhtown’s picture

Another thing to consider is the encoding of your database. It should be utf-8. However, that doesn't sound like the issue here.

What sounds more likely is that you have specified a font (probably in your css page) that is available on your system but that does not include Greek characters.

Other than that, you may have just hit a really weird bug. You could try entering text from other languages or different text in Greek to see what happens.

crbassett’s picture

It really can't be the fonts since the font works on CCK fields, but not on body or title fields.

I don't think it is the database encoding, since Greek works in CCK fields and on other website with the same host (and thus created with the same configuration.

I tried entering some unicode Hebrew, and got the same question mark problem.

Right now I'm going to upgrade one of those pre-5.7 sites and see if the bug arises when 5.7 is on the scene.

lhtown’s picture

It would be possible to have your CSS apply different styles and different fonts to different content types. (Of course I am talking about how it is displayed after it is saved, not how it looks when you first enter it.)

Multiple languages and even mixed languages is something that a lot of people do with Drupal and is something that has been implemented well for a long time. I really don't think you have found a bug in Drupal core that will be fixed when you upgrade. It might be possible, but I would say very unlikely.

I think the issue is probably with your theme or font. Your DTD probably says the article is English which might prevent the browser from trying to substitute an appropriate font. I would suggest trying the default theme with a default stylesheet to see what happens.

Another thing you can check for clues to your problem is to actually open up the database with phpmyadmin or whatever you use and view the text that is saved in the database to see if it is saved as Greek characters or question marks.

crbassett’s picture

OK, I went into the database and found that the question marks were there too. I looked and a lot of my tables were set to latin_swedish_1 or something like that. How that happened, I have no idea. All I know is that this database originated a while back from a 4.7 installation. So, since I know nothing about bulk SQL queries and executions, I just dumped the entire database using mysqldump and opened the file in Textmate, and did a find & replace - taking "latin1" and replacing it with "utf8". Everything seems to be working like a charm now.

Thanks so much for helping me get to the bottom of this.