Download & Extend

Remove BOM from UTF-8 files in Drupal 6?

Project:Feeds
Version:6.x-1.0-beta11
Component:Code
Category:bug report
Priority:normal
Assigned:Unassigned
Status:active

Issue Summary

I think I am having the same problem in Drupal 6 as was described and fixed here (http://drupal.org/node/953538) for Drupal 7. Is anyone else having this problem or is it not even the same problem. It appears that the BOM is not being stipped from the file before being imported and due to this I can not import from the first colum, if I strip the BOM manually the first colum imports fine. I am using a CSV file.

Comments

#1

Only one way to find out: backport and test #953538: Remove BOM from UTF-8 files

#2

I tried doing that before posting but unfortunatly it is beyond me.

#3

Version:6.x-1.0-beta10» 6.x-1.0-beta11

I do not think we should put the BOM corrector/checker inside a fetcher, but should be in parser. Fetcher should remain what it fetch as it is, and parser should handle variants on field names, file structures etc.

So in D6:

  1. FeedsParser.inc abstract a function parse()
  2. in FeedsCSVParser.inc implements parse(), and separate into parseHeader() and parseItems()
  3. ParserCSVIterator->parse() in ParserCSV.inc

add to line 199:

<?php
if (substr($line, 0,3) == pack('CCC',0xef,0xbb,0xbf)) {
 
$line = substr($line, 3);
}
?>

these should work with CSV files.

for sitemap, OPML, simplepie, syndication, they use FeedsImportBatch->getRaw(), so:
FeedsBatch.inc line 178 (untested)

<?php
if (substr($raw, 0,3) == pack('CCC',0xef,0xbb,0xbf)) {
 
$raw= substr($raw, 3);
}
?>

if someone find this interesting, patch can follows.