Early Bird Registration for DrupalCon Portland 2024 is open! Register by 23:59 PST on 31 March 2024, to get $100 off your ticket.
This is something I needed today to import data from MSSQL server that has different character encoding. I'm using Feeds SQL combined with DBLIB driver for SQL Server and this was the only way to get charset right. This might not be a very often required plugin but maybe it will help someone. The charset list could be expanded/modified.
Comment | File | Size | Author |
---|---|---|---|
#5 | interdiff-before-rename.txt | 1.99 KB | justindodge |
#5 | interdiff.txt | 2.71 KB | justindodge |
#5 | feeds-tamper-charset_1817516-5.patch | 1.17 KB | justindodge |
Comments
Comment #1
Dubs CreditAttribution: Dubs commentedThanks for submitting this. Can this please be committed as it would be very useful?
Comment #2
justindodge CreditAttribution: justindodge commentedYes please! This just saved my bacon. Feeds as a whole is missing good charset support in some key places and I think this would be really excellent addition for the toolkit.
Comment #3
justindodge CreditAttribution: justindodge commentedComment #4
twistor CreditAttribution: twistor commentedThis should really just use drupal_convert_to_utf8(). Drupal assumes utf8, so we don't need and charset_to at all.
Comment #5
justindodge CreditAttribution: justindodge commentedHere's the original match minus
charset_to
that utilizes Drupal's built indrupal_convert_to_utf8()
.The interdiff after renaming was mostly useless, so here's an extra interdiff before changing the filename of the plugin (obviously iconv.inc wasn't really appropriate after this change).
Comment #6
justindodge CreditAttribution: justindodge commentedIt should probably be noted that if you were using the original patch, switching to the patch in #5 would cause your existing charset tampers from that patch to disappear.
Comment #7
timlie CreditAttribution: timlie commentedWorks like a charm, nice!
Comment #8
mccrodp CreditAttribution: mccrodp commentedShould this provide an option "Detect encoding" to detect encoding in the
feeds_tamper_utf8_form
? i.e. - using http://php.net/manual/en/function.mb-detect-encoding.php in strict mode.I'm guessing this would need fallbacks / warning or error messages though on undetected or unknown encoding...is it worthwhile adding this or is there a reason it was omitted?
EDIT: Just seeing this issue now where this was also highlighted: #2263119-8: Create plugins for handling UTF-8 problems
Comment #9
hargobindI realize the usefulness of this feature is much bigger than just this specific use-case, but I wanted to share this for anyone that finds this thread who is looking for a way to convert the encoding of a CSV file. The Feeds team recently committed a patch (#1428272: Added support of encoding conversions to the CSV Parser) that gives the ability to select the source encoding and converts to UTF-8 for the CVS parser.
Comment #10
cmseasy CreditAttribution: cmseasy commentedUsefull feature, using patch #5 on a production site for more then a year with success.
I do not use the Feeds CSV parser, so I need this function in feeds_tamper.
Please commit, changed to RTBC.