Closed (fixed)
Project:
phpBB2Drupal
Version:
5.x-2.0
Component:
Code
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
3 Jun 2006 at 15:06 UTC
Updated:
8 Feb 2008 at 11:32 UTC
Jump to comment: Most recent file
Comments
Comment #1
beginner commentedHello Hakeem,
How did you convert phpBB to utf8?
Your Drupal site is encoded in utf8, too (look at page header)?
phpbb2drupal currently doesn't handle encoding problems at all, but it should.
I'll be away from my computer during the next couple of days, so I won't be able to help, but I'll try to fix this problem next week.
Thanks for reporting.
Comment #2
beginner commentedI just updated the module.
Please try the latest version and test.
If everything works as expected, please let me know.
Reopen this issue if you find any problem with the encoding of any part of the forum (title, content, forum name, private messages, etc...). My test DB is in English, so it's difficult for me to make exhaustive tests.
thanks.
Comment #3
hakeem commentedDear begginer,
I tested the module on another server and the encoding wasn't changed! May be because of somthing wrong in encoding-related settings of the environment (the database itself, or MySQL server).
Anyway, thanks alot for your support!
Comment #4
beginner commentedHakeem,
Can you describe precisely what you tried to do?
Maybe you didn't select the proper encodings in the settings.
Can you answer those questions:
Did you download the latest module, with the encoding support?
What mysql version do you use?
what is the encoding used in your phpBB board? (look at the headers). Can I have a link to the board?
What is the encoding used in your Drupal installation?
Do you have the mbstring module installed? (the setting page of the module should tell you that.
Were there any error message you saw in the setting page?
With my setup, I can only do limited tests. I don't have a test phpBB database encoded differently to test with.
If you think you did everything right, and selected the right FROM and TO encoding, and it still doesn't work, I need a copy of your database to test with.
http://drupal.org/user/23181/contact
Can you contact me above, if you are willing to give me a copy of the data to test with.
yours,
Beginner.
Comment #5
beginner commentedAt last, I used my own module on my own live web site:
http://www.reuniting.info/forum/
At the same time, I did a small test to check the encoding settings, and it worked as expected.
I cannot help without being given more details.
Comment #6
(not verified) commentedComment #7
beginner commentedHello Hakeem,
I just received your private message. I reply here in case whatever fix we find for your case will be helpful for other people.
Your situation:
charset=windows-1256 (= Arabic)
MySQL 4.1.19
Now, I just find out that your encoding is not supported by mbstring.
In the page below, you can see the list of all supported encodings:
http://php.net/mbstring
Since your encoding is not supported by mbstring, there is little I can do at the phpbb2drupal module level.
Now, there exists maybe another solution: you would have to convert the encoding on the phpBB data base itself BEFORE you attempt the migration. I have been searching the net for a solution, but have not found it yet. It doesn't mean that it doesn't exist.
Another solution would be to keep the encoding. Try installing Drupal, change the theme so that the page header is not utf8 but your arabic encoding. I don't know if Drupal arabic translation would work, then (actually, I am pretty sure it won't work, but you can try). If you don't mind being stuck with the 1256 encoding with English navigation, then you can go this way and circumvent completely the conversion problem.
Still, there must be a tool somewhere that should allow you to convert your DB before the migration.
Comment #8
beginner commentedI may have found an easy solution:
http://www.php.net/manual/en/function.iconv.php
which php version do you use?
Can you create a file named
test.phpwith the following content:and try to load the file in the server and access it with your browser: do you get any error?
try both at home, and on the remote server.
Comment #9
KMG commentedthe same thing happen to me, so what is the solution?
where i have to upload this test file.....
am using drupal5.2 phpbb2 and the language is Arabic
Comment #10
beginner commentedComment #11
naheemsays commentedAttached is a patch to use iconv instead of mbstring. I have also removed the mbstring check as iconv is available as standard - no need to check for it.
Finally, I have added a windows-1256 to the options for conversion.
@Beginner - any reason Windows1251 and 1252 have a (CP1252) and (CP1251) in that array? I have not added a corresponding (CP1256) to my addition as I have no idea what it is for.
Comment #12
beginner commentedIf I remember well, Windows1251 is an alternative name for CP1252.
Comment #13
naheemsays commentedAttached is the newupdated patch. (added CPC1256 to description, rerolled without the split I had planned for the module.)
Comment #14
naheemsays commentedPatch has been committed to head and Drupal-5 branch.
Comment #15
beginner commentedThe reason I hadn't made the change to icon() earlier, is that I was not sure it would be installed on every server.
The guy never replied to my question in #8.
I just tested iconv() on my computer and I get:
Fatal error: Call to undefined function: iconv() .
Should this issue be re-opened to get user feedback about the existance of iconv on their systems?
Comment #16
beginner commentedAlso, iconv() doesn't seem to support multibyte strings, which was the reason the earlier function was used. As such, the module wouldn't work for other users using JKC languages.
Comment #17
naheemsays commentedYes, a few issues have cropped up.
1. According to php Manual page, some platforms call the function libiconv. (http://uk3.php.net/manual/en/function.iconv.php) It also shows a workaround, but this function should be available on all platforms in one form or another since php4.0.5. Is there elsewhere I can look for corroboration? I can add an option to use mbstring where available but that would leave the original bug of not encoding from like Arabic (CP1256).
2. From reading the php Manuals, iconv *should* handle multibyte strings. It is even used in some other examples to convert from multibyte to Unicode so that other functions can use the string.
3. I have also noticed another problem with the change. (It cuts off at the first character it cannot encode, thus losing the rest of that node data. I need to add
//TRANSLITafter the output charset string to fix this and give the best match character.4a. According to the manual, there is a bug or a feature where iconv will work even if the input charset is not defined. This will need further investigation, but if it works, I think removing the input encoding option would be a good thing.
4b. Does Drupal use other charsets apart from UTF-8? Just wondering if the output charset is needed as an option, or wether it can be fixed to UTF-8?
4c. The charset options may need to be changed to put "CP1256" etc instead of "Windows-1256 (CP1256)"
EDIT @ Beginner - What system are you using?
Comment #18
naheemsays commentedJust updated the drupal-5--3 branch to check if the functions iconv or libiconv exist, and to also use best match when an exact match is not available instead of cutting the string at the first illegal character.
Comment #19
beginner commentedMaybe you can postpone this issue until you get some feedback from the users.
If there is a problem, you can make a configuration setting, giving the choice between the two. A switch would use either one or the other function according to the setting.
Drupal uses uft8 by default everywhere, i.e. in the theme (see headers) and in the DB (see encoding setting). I don't see a good reason for people to change the default.
Libiconv was not installed by default on my development platform, but when I noticed it, it was easy to install the missing package (php-iconv).
I am using Mandriva but I plan to switch to Debian, when I can.
Comment #20
naheemsays commentedheh, just looking at the API, I found this:
http://api.drupal.org/api/function/drupal_convert_to_utf8/6
I have changed HEAD to use this. However, this function will bale out if it cannot convert a character using the iconv function, instead of finding next match (or even ignoring that character) and moving on. I will need to file a bug report to fix this.
Comment #21
naheemsays commentedJust been looking at phpbb3 to see how it changes from phpbb2(many encodings) to phpbb3 (utf-8) as I figure they would be the experts for their encodings.
Main encoding is iso-88559-1 (for english atleast.), but it forces the recoding to actually encode from cp1252.
It also has the following commented out section listing other encodings (includes/utf/utptools.php):
The actual convert function after this is similar to what we have in Drupal (but it also has many manual recoders for cases where none of the functions we use [Iconv, mbstring and recode, in that order.] exist. )
phpbb is also under the GPL, Can I borrow the above table to replace the one we have now?
We also have an option to "automate" all this by getting the "default language" from the phpbb_config table. (maybe leave an option to encode or not for those who do not have any encoding functions available.)
Comment #22
beginner commentedThat's great. Obviously, phpBB knows better how its own data is encoded. It's all GPL so you are free to borrow any code you like, if it can help a user migrate from phpBB2 to Drupal.
Comment #23
naheemsays commentedNot sure if is it connected, but I keep getting errors with "smart quotes". “ gets changed to “ and ” to â€. The - even borked the conversion as an illegal character!
I see there is a function to turn similar things into html characters. Probably need to see why it is not working.
Comment #24
naheemsays commentedthe issue with #23 is the line
$text = html_entity_decode($text, ENT_QUOTES);in the encoding function. Do we really need this? I think everything will work just as well without it?Comment #25
beginner commentedWhat is the encoding of your data?
It is iso-8859-1 or windows 1252?
The latter is adding em-dash and fancy quotes that is not part of the basic iso-8859-1 encoding.
Anyhow, securitywise, I think it should be ok to remove that line of code, as long as the INSERT follow the proper Drupal API (they do).
Do check, though. _phpbb2drupal_text_encode() is called in many places.
Comment #26
beginner commentedabout #24.
Here is how you can do some investigative work.
1) go to http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/phpbb2drupa...
2) go to the 'annotate' view. http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/phpbb2drupa...
3) check the line of code you wonder about:
1424 : augustin 1.41 $text = html_entity_decode($text, ENT_QUOTES);you see that this line was added by myself in version 1.41.
http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/phpbb2drupa...
You go back to the log view, and you see:
which gives you the reference to the issue:
http://drupal.org/node/114451
That's why it's important you always reference the issue with a # sign each time you commit some code: http://drupal.org/project/cvs/45403
You never know when you (or the next maintainer) will wonder about a change.
When I did my migration, I didn't have that line of code, and I remember that I did have problems with quotes in titles, etc.
And according to the issue referenced above, html_entity_decode() is necessary for node titles and comment titles... but apparently not for the body.
You need to test this (node title, comment title, body, with and without the extra code). Add quotation marks in your titles for testing.
Comment #27
naheemsays commentedThanks for giving me more details on how to investigate. Very helpful.
I will take this issue to the right place so as not to pollute this topic:
http://drupal.org/node/114451
Comment #28
Evance commentedthis module is so surprising...
i just need it !! but my forum is based on phpbb3 ..
how can i use it via making some modification ?
Comment #29
beginner commented@Evance: this is the wrong issue. There is another issue about phpbb3. Don't add noise here.
phpbb3 is currently unsupported.
If you want support for phpbb3, you can either provide a patch (see patching guidelines in handbook), or pay nbz some money for him to support it soon.
Don't reply here but in the proper issue.
Comment #30
naheemsays commentedI have reverted some changes and borrowed code from the drupal_convert_to_utf8 function as I needed it to function slightly differently(use //TRANSLIT for iconv encoding to avoid dataloss - I have supplied a patch for the main function in another issue - this also makes iconv function similar to mbstring.).
http://drupal.org/node/205406
Once/if that is fixed, I can go back to using the function directly.
Comment #31
briinums commentedhello!
I have phpbb2, encoding i use on site is utf-8 (i write in latvian language). at the same time phpmyadmin says mysql encoding is UTF-8, collation of table field is "latin1_swedish_ci" (according to php.net it is windows-1252)
i tried to import data:
1) without encoding
2) encoding from isoblablabla-1
3) encoding from UTF-8
4) encoding from windows-1252
none of these worked :/ the non-english characters are screwed up..
my server HAS mbstring module.
i have no idea how to use the .patch files you posted above, so i havent tried them (for iconv)
i have no shell access to my server, just ftp
i hope you can help me..
Comment #32
beginner commentedYou need to fix your phpBB DB first.
See this: http://drupal.org/node/187689
This is a separate issue.
Comment #33
Anonymous (not verified) commentedAutomatically closed -- issue fixed for two weeks with no activity.