This is something I needed today to import data from MSSQL server that has different character encoding. I'm using Feeds SQL combined with DBLIB driver for SQL Server and this was the only way to get charset right. This might not be a very often required plugin but maybe it will help someone. The charset list could be expanded/modified.

Support from Acquia helps fund testing for Drupal Acquia logo

Comments

Dubs’s picture

Issue summary: View changes

Thanks for submitting this. Can this please be committed as it would be very useful?

justindodge’s picture

Yes please! This just saved my bacon. Feeds as a whole is missing good charset support in some key places and I think this would be really excellent addition for the toolkit.

justindodge’s picture

Status: Active » Reviewed & tested by the community
twistor’s picture

Status: Reviewed & tested by the community » Needs work
Issue tags: +Needs tests

This should really just use drupal_convert_to_utf8(). Drupal assumes utf8, so we don't need and charset_to at all.

justindodge’s picture

Status: Needs work » Needs review
FileSize
1.17 KB
2.71 KB
1.99 KB

Here's the original match minus charset_to that utilizes Drupal's built in drupal_convert_to_utf8().

The interdiff after renaming was mostly useless, so here's an extra interdiff before changing the filename of the plugin (obviously iconv.inc wasn't really appropriate after this change).

justindodge’s picture

It should probably be noted that if you were using the original patch, switching to the patch in #5 would cause your existing charset tampers from that patch to disappear.

timlie’s picture

Works like a charm, nice!

mccrodp’s picture

Should this provide an option "Detect encoding" to detect encoding in the feeds_tamper_utf8_form? i.e. - using http://php.net/manual/en/function.mb-detect-encoding.php in strict mode.

I'm guessing this would need fallbacks / warning or error messages though on undetected or unknown encoding...is it worthwhile adding this or is there a reason it was omitted?

EDIT: Just seeing this issue now where this was also highlighted: #2263119-8: Create plugins for handling UTF-8 problems

hargobind’s picture

I realize the usefulness of this feature is much bigger than just this specific use-case, but I wanted to share this for anyone that finds this thread who is looking for a way to convert the encoding of a CSV file. The Feeds team recently committed a patch (#1428272: Added support of encoding conversions to the CSV Parser) that gives the ability to select the source encoding and converts to UTF-8 for the CVS parser.

cmseasy’s picture

Status: Needs review » Reviewed & tested by the community

Usefull feature, using patch #5 on a production site for more then a year with success.
I do not use the Feeds CSV parser, so I need this function in feeds_tamper.
Please commit, changed to RTBC.