uif_clean_and_key_row() doesn't sanitize input beyond trimming values. Non-printing characters, e.g. GiguËre, cause the Batch API process to fail with errors.

Running preg_replace('/[^\x0A\x0D\x20-\x7E]/', '', $value) on input fixes the problem but seems draconian as it removes useful characters.

UTF-8 conversion required?

Comments

xcheng’s picture

dwightaspinwall’s picture

thank you -- I'll have a look

dwightaspinwall’s picture

7.x-1.0-beta8 now has same preg_replace as above. Not sure about using utf8_encode(); not seeing this in any other code.

DieWaldfee’s picture

Version: 6.x-1.x-dev » 7.x-1.0-beta2
Category: bug » support

is there a possibility to import Korean characters. i dont understand the uft8-encode thing :/

dwightaspinwall’s picture

I need to look into this as I don't know the proper way to support it. Any information welcome.

rolkos’s picture

Version: 7.x-1.0-beta2 » 7.x-1.0-beta8
Category: support » bug

After update from beta7 to beta8 I can't import Swedish nor Polish charachters like "ąężźńćłóś å ö ä". After import I get info "Warning on row 2: Illegal characters were removed from first_name column. May require edit." it should not work that way. I also can't import URL to url field.

dwightaspinwall’s picture

Status: Active » Fixed

fixed in 7.x-1.0-beta9 and 6.x-1.6.

Checks for valid UTF-8 and accepts it if so or, if invalid, strips all but ASCII and throws warning (but still succeeds at import).

rolkos’s picture

Status: Fixed » Active

Thanks, now it's better but if first character in imported string is non ASCII character all non ASCII characters would be removed without warning, unless there is ASCII character inside string, in this case it removes only non ASCII part from the string.

For example if we import strings like:
Zażółć
It would be imported correctly, if we have string like:
ąężźńćłóśöä
It will not be imported at all without any warning.
If we try to import:
ąężźńAćłóśöä
it will import part starting with ASCII sign in this case:
Aćłóśöä

Thanks

dwightaspinwall’s picture

Assigned: Unassigned » dwightaspinwall
Status: Active » Closed (works as designed)
StatusFileSize
new49.92 KB
new31.24 KB

@rolkos: I had no problem importing users with the names you provided. No errors. See attached.

UIF expects the input file to be in UTF-8 format. It does not do any conversion. For my test I used Google Docs spreadsheet. I have seen problems with Microsoft Excel and recommend against its use.

Hope this clears things up.