Dear Maintainer,

I´m trying to import 2000 user in a Drupal 5.3 installation with the user_import_module. I figured out how to import all the special german letters. But only in the usernames i can not have this letters, ä,ö and ü. But I can import them in the profile fields. So is the username not utf_8 or did i missed something? I could change the module code, but I wanted to ask first.

Warm regards from Cologne

Dirk

Comments

mansdanielsson’s picture

Hi there,

I'm using the Profile module, allowing you to create customizable fields for user accounts. The same problem applies to å,ä,ö in the values of custom fields.

Best regards,
The Swede

mat.’s picture

Title: No ä,ö and ü in the Usernames » No ěščřžýáíé in the fields of Profile module
Assigned: Unassigned » mat.

Hi there,
I have the same problem - I need to import users with informations into their Profile, but these values (ěščřžýáíé) will disappear in the custom fields.. (and all characters after one of this will disappear too...:-(
Please any help?

Kind regards, mat.

syngi’s picture

The ë seems to disappear too...
If no fix can be made, please at least give a warning when any such character is encountered. And anyhow in the meantime warn for this on http://drupal.org/project/user_import and in the readme.

kakor’s picture

Just bumping this issue. This module would be very useful to me if this issue could only be fixed. I need to import 25.000 users and I would like it a lot better if I didn't have to go through them manually to fix this problem.

robert castelo’s picture

Is there anyone with experience of UTF-8 that wants to fix this?

The code that needs to be fixed is:

_user_import_sanitise_username() in user_import.module

Alatalo’s picture

I too had trouble with imported usernames so I copy pasted and modified these regexps from the 5.7 core user.module. Now scandinavian characters seem to work properly in imported user names. It's not pretty but so far it seems functional.

user_import.module, line 1174:

// comment out the original regexp
// $username = preg_replace('/[^a-zA-Z0-9@ ]/', ' ', $username);

// modified from 5.7 core user.module
$username = preg_replace('/[^\x80-\xF7 [:alnum:]@_.-]/', ' ', $username);
$username = preg_replace(
			'/[\x{80}-\x{A0}'.          // Non-printable ISO-8859-1 + NBSP
			'\x{AD}'.                 // Soft-hyphen
			'\x{2000}-\x{200F}'.      // Various space characters
			'\x{2028}-\x{202F}'.      // Bidirectional text overrides
			'\x{205F}-\x{206F}'.      // Various text hinting characters
			'\x{FEFF}'.               // Byte order mark
			'\x{FF01}-\x{FF60}'.      // Full-width latin
			'\x{FFF9}-\x{FFFD}'.      // Replacement characters
			'\x{0}]/u',
			' ', $username);
sfks’s picture

Version: 5.x-1.3 » 5.x-2.x-dev
sfks’s picture

Version: 5.x-2.x-dev » 5.x-1.3

Same problem with 5.x-2x-dev.

The profile fields are truncated with especial characters as º , á ,é, í, ó, ú.

Any workaround?

(Sorry for my change of version...)

robert castelo’s picture

Is this still a problem in User Import 5.x-2.0-beta1?

Please confine this issue to Profile fields (not Username).

alexkessler’s picture

I couldn't import usernames with special letters like ü,ä,ö with User Import 5.x-2.0-beta4.
To fix this, i put the following line right after the _user_import_sanitise_username function:

function _user_import_sanitise_username($username) {

  $username  = utf8_encode($username);

....

works fine for me...

Andreas Goebel’s picture

Hi,

those problems appear in 6.x.1.1, too. Example: I import "Jürgen" and use it for the first part of the username as well as for a textfield (First name).

In the username after the import, "Jürgen" appears as "Jrgen", whereas in the text field it is even worse, it appears as "J".

Should I post an issue for the 6.x version?

Regards,

Andreas

Andreas Goebel’s picture

Hi again,

the $username = utf8_encode($username); fix works for 6.x, too.

Now the usernames contain the special characters, whereas the textfields are still truncated.

Regards,

Andreas

Andreas Goebel’s picture

Hi,

I´ve played around with this a bit more.

Problems: If the field-names in the first row contain special characters, they do not appear in the field-match form.

I´ve done the following fixes:

1033: foreach ($data_row as $data_cell) {
$data_cell = utf8_encode($data_cell);

Here the field names are read, I inserted an utf8_encode.

1092: function _user_import_sanitise_username($username) {
//$username = utf8_encode($username);

Here I removed the fix (see above), because later on I convert all strings to utf8.

function _user_import_process($settings) {

...

while ($data = fgetcsv($handle, $line_max, ',')) {
foreach ($data as $key => $value) {
$data[$key]=utf8_encode($value);
}

Here I convert all read data to utf8.

When I first had not deleted the old fix, usernames were wrong again. From this I conclude, that converting to utf8 twice leads to wrong results. So I guess that if the csv-file is already utf8, those fixes will lead to wrong results in the whole file. I have no idea if php is able to check this. If not, one could a) insert an encoding-tag (ugly) or b) insert a checkbox in the options of user-import, where the user will have to determine if the csv-file is utf8 or not himself.

Sorry if my code is ugly, I usually do not program php (this is the first time).

Regards,

Andreas

robert castelo’s picture

Version: 5.x-1.3 » 6.x-1.2

Seems like the solution would be to check encoding with mb_check_encoding() and use mb_convert_encoding() to convert the data if it isn't UTF-8.

http://uk3.php.net/manual/en/function.mb-check-encoding.php

We would want to do this as few times as possible to be efficient, so check encoding at file upload and convert whole file? or at start of processing?

Andreas Goebel’s picture

As you read at least the first line twice (once for the form, once for processing) I would suggest to convert it upon upload.

mlsamuelson’s picture

Status: Active » Needs review
StatusFileSize
new879 bytes

I've taken a stab at catching non-utf-8 encoded content when the file is first loaded, and resaving encoded as utf-8. This patch assumes we're coming from ISO-8859-1 (as utf8_encode() does) because I was unable to determine how to accurately detect the encoding. My tests worked as expected, though.

Perhaps someone with a bit more knowledge of how encodings work could push it further.

This code only runs once per import, so that goal is met.

mlsamuelson’s picture

StatusFileSize
new2.03 KB

With the changes in the patch from 16, I uncovered an issue with PHP (http://bonsai.php.net/bug.php?id=48507) where fgetcsv() won't return a value if the csv field value has a leading special character AND PHP's locale setting isn't set for UTF-8.

Doing a setlocale(LC_ALL, 'en_US.UTF-8'); before each occurrence of fgetcsv() does the trick.

New patch that incorporates 16 and this, attached.

rogerpfaff’s picture

subscribe

rogerpfaff’s picture

Against which version of user import did you write this patch? I'm not able to apply it to 6-2.3.

mlsamuelson’s picture

@rogerpfaff It's applied against 6.x-1.2.

thepanz’s picture

Assigned: mat. » Unassigned

Subscribe! I'll try to provide a 2.x-dev patch soon!

sgabe’s picture

Version: 6.x-1.2 » 6.x-2.x-dev
StatusFileSize
new1.91 KB

The attached patch is against the master branch, 6.x-2.x seems to be a mess. I had the same problem with áéűőúöüó etc. and the patch solved it.

galooph’s picture

Version: 6.x-2.x-dev » 6.x-4.x-dev
Issue summary: View changes
StatusFileSize
new1.79 KB

I just ran into this same issue on a client's site.

I re-rolled the patch from #22 against 6.x-4.x and that seems to have cured the problem.

gisle’s picture

Status: Needs review » Closed (duplicate)
Parent issue: » #813726: Non-ascii letters in users name or address truncates the rest

The README.txt clearly spells out that the CSV-file must be saved as UTF-8.

Closing as duplicate of #813726: Non-ascii letters in users name or address truncates the rest.