Closed (duplicate)
Project:
User Import
Version:
6.x-4.x-dev
Component:
Code
Priority:
Normal
Category:
Support request
Assigned:
Unassigned
Reporter:
Created:
8 Nov 2007 at 11:03 UTC
Updated:
28 May 2019 at 08:13 UTC
Jump to comment: Most recent, Most recent file
Comments
Comment #1
mansdanielsson commentedHi there,
I'm using the Profile module, allowing you to create customizable fields for user accounts. The same problem applies to å,ä,ö in the values of custom fields.
Best regards,
The Swede
Comment #2
mat. commentedHi there,
I have the same problem - I need to import users with informations into their Profile, but these values (ěščřžýáíé) will disappear in the custom fields.. (and all characters after one of this will disappear too...:-(
Please any help?
Kind regards, mat.
Comment #3
syngi commentedThe ë seems to disappear too...
If no fix can be made, please at least give a warning when any such character is encountered. And anyhow in the meantime warn for this on http://drupal.org/project/user_import and in the readme.
Comment #4
kakor commentedJust bumping this issue. This module would be very useful to me if this issue could only be fixed. I need to import 25.000 users and I would like it a lot better if I didn't have to go through them manually to fix this problem.
Comment #5
robert castelo commentedIs there anyone with experience of UTF-8 that wants to fix this?
The code that needs to be fixed is:
_user_import_sanitise_username() in user_import.module
Comment #6
Alatalo commentedI too had trouble with imported usernames so I copy pasted and modified these regexps from the 5.7 core user.module. Now scandinavian characters seem to work properly in imported user names. It's not pretty but so far it seems functional.
user_import.module, line 1174:
Comment #7
sfks commentedComment #8
sfks commentedSame problem with 5.x-2x-dev.
The profile fields are truncated with especial characters as º , á ,é, í, ó, ú.
Any workaround?
(Sorry for my change of version...)
Comment #9
robert castelo commentedIs this still a problem in User Import 5.x-2.0-beta1?
Please confine this issue to Profile fields (not Username).
Comment #10
alexkessler commentedI couldn't import usernames with special letters like ü,ä,ö with User Import 5.x-2.0-beta4.
To fix this, i put the following line right after the _user_import_sanitise_username function:
works fine for me...
Comment #11
Andreas Goebel commentedHi,
those problems appear in 6.x.1.1, too. Example: I import "Jürgen" and use it for the first part of the username as well as for a textfield (First name).
In the username after the import, "Jürgen" appears as "Jrgen", whereas in the text field it is even worse, it appears as "J".
Should I post an issue for the 6.x version?
Regards,
Andreas
Comment #12
Andreas Goebel commentedHi again,
the $username = utf8_encode($username); fix works for 6.x, too.
Now the usernames contain the special characters, whereas the textfields are still truncated.
Regards,
Andreas
Comment #13
Andreas Goebel commentedHi,
I´ve played around with this a bit more.
Problems: If the field-names in the first row contain special characters, they do not appear in the field-match form.
I´ve done the following fixes:
1033: foreach ($data_row as $data_cell) {
$data_cell = utf8_encode($data_cell);
Here the field names are read, I inserted an utf8_encode.
1092: function _user_import_sanitise_username($username) {
//$username = utf8_encode($username);
Here I removed the fix (see above), because later on I convert all strings to utf8.
function _user_import_process($settings) {
...
while ($data = fgetcsv($handle, $line_max, ',')) {
foreach ($data as $key => $value) {
$data[$key]=utf8_encode($value);
}
Here I convert all read data to utf8.
When I first had not deleted the old fix, usernames were wrong again. From this I conclude, that converting to utf8 twice leads to wrong results. So I guess that if the csv-file is already utf8, those fixes will lead to wrong results in the whole file. I have no idea if php is able to check this. If not, one could a) insert an encoding-tag (ugly) or b) insert a checkbox in the options of user-import, where the user will have to determine if the csv-file is utf8 or not himself.
Sorry if my code is ugly, I usually do not program php (this is the first time).
Regards,
Andreas
Comment #14
robert castelo commentedSeems like the solution would be to check encoding with mb_check_encoding() and use mb_convert_encoding() to convert the data if it isn't UTF-8.
http://uk3.php.net/manual/en/function.mb-check-encoding.php
We would want to do this as few times as possible to be efficient, so check encoding at file upload and convert whole file? or at start of processing?
Comment #15
Andreas Goebel commentedAs you read at least the first line twice (once for the form, once for processing) I would suggest to convert it upon upload.
Comment #16
mlsamuelson commentedI've taken a stab at catching non-utf-8 encoded content when the file is first loaded, and resaving encoded as utf-8. This patch assumes we're coming from ISO-8859-1 (as utf8_encode() does) because I was unable to determine how to accurately detect the encoding. My tests worked as expected, though.
Perhaps someone with a bit more knowledge of how encodings work could push it further.
This code only runs once per import, so that goal is met.
Comment #17
mlsamuelson commentedWith the changes in the patch from 16, I uncovered an issue with PHP (http://bonsai.php.net/bug.php?id=48507) where fgetcsv() won't return a value if the csv field value has a leading special character AND PHP's locale setting isn't set for UTF-8.
Doing a
setlocale(LC_ALL, 'en_US.UTF-8');before each occurrence of fgetcsv() does the trick.New patch that incorporates 16 and this, attached.
Comment #18
rogerpfaffsubscribe
Comment #19
rogerpfaffAgainst which version of user import did you write this patch? I'm not able to apply it to 6-2.3.
Comment #20
mlsamuelson commented@rogerpfaff It's applied against 6.x-1.2.
Comment #21
thepanz commentedSubscribe! I'll try to provide a 2.x-dev patch soon!
Comment #22
sgabe commentedThe attached patch is against the master branch, 6.x-2.x seems to be a mess. I had the same problem with áéűőúöüó etc. and the patch solved it.
Comment #23
galooph commentedI just ran into this same issue on a client's site.
I re-rolled the patch from #22 against 6.x-4.x and that seems to have cured the problem.
Comment #24
gisleThe README.txt clearly spells out that the CSV-file must be saved as UTF-8.
Closing as duplicate of #813726: Non-ascii letters in users name or address truncates the rest.