Doesn't recognize Newline

LizCastro - April 28, 2009 - 22:28
Project:Node import
Version:6.x-1.0-rc4
Component:Miscellaneous
Category:support request
Priority:normal
Assigned:Unassigned
Status:active
Description

Hi,
I'm importing a tab-delimited file that I exported from FileMaker for Mac. I upload the file but when I get to step 3, and choose Tab delimited, Record 1 is in Columns 1-12 but Record 2 is in Columns 13-24. I've tried other values for Newline, but don't know what else to try. I've also tried to export in different formats (for Mac, in UTF, for Windows, etc.).

Any thoughts?

thanks,
Liz

#1

Balefire - April 30, 2009 - 10:57

I was having the same issue. File details:
csv exported from OpenOffice.org 3 Calc on Ubuntu Jaunty.

I had ticked the "First row has column names" and it had correctly identified the first row and showed the column header names. All the other rows were on one line - i.e. the first proper row was split according to comma separator, with the last column containing the text for all the other rows, including commas. double-quotes, etc.

I deleted the file from the server, updated to 6.x-1.x-dev, re-exported from OO.o and imported the file again. The same issue occured.

I saved the file as a .xls in OO.o and opened the file via Excel 2007 on Windows XP. Exported as .csv, imported the new file and now the file is split correctly.

I can provide a copy of the file in OO.o exported .csv and Excel exported .csv if this would be of use to you.

#2

tstermitz - May 5, 2009 - 15:40

Yeah, same problem on a Mac. I think it has to do with the Excel export of csv:

I succeeded when I chose export to csv for Windows. Then I got the proper line endings.

Maybe it needs a comma followed by newline:
x,y,z,\n
a,b,c,\n

Instead of
x,y,z\n
a,b,c\n

#3

Balefire - May 12, 2009 - 14:04

Without checking other files, I think I have tracked the issue down to the import filter may require Windows style end of line characters (\r\n) instead of the standard \n used by everyone else. I can consistently remove and reproduce the issue with a single file by using Scite's 'Convert Line End Characters' and switching between CR + LF (\r\n = works) and LF (\n = not work).

HTH

#4

bvirtual - May 16, 2009 - 05:12

Thanks, going to DOS EOL from Unix fixed my issue.

If you use the escape of \" and check just that one radio button, it does not check the far left radio button to use that customization.

The EOL \r and \n are legit values to find inside field values, such as HTML textarea inputted data in CSV data,
but not tabbed delimited with EOL characters for the end of the record.

Also good to know, is the HTTP POST transmission method from the browser to the web server requires a non binary file. The web browser will uuencode or similar (base64, etc). The browser process involves converting all the file's bytes to "hex" including the embedded EOL characters, line wrap to a fixed record length, append the required HTTP specified EOL (DOS I do hazily recall) and uploads the result.

The web server receives the uploaded uudecoded file, and uudecodes. Not all browsers (or servers) have implemented this properly (mostly older browsers/servers). Since Apache will be "friendly", it attempts a conversion, never creating any error message to the CGI script handling the uploaded file. The web server might create a corrupted file to pass to the CGI script, which would (optionally) store the corrupted file on hard drive. Now, most modern browsers and servers correctly do the HTTP Upload File "hex" conversion.

Point is, there are so many ways to go wrong with EOL (End Of Line) that are "invisible" to the Drupal user.

I applaud the module's maintainer. I want this module for a long, long time.

Primer on EOL is here: http://www.peterbenjamin.com/seminars/crossplatform/texteol.html
It's old and one day I will update it for HTTP Post Upload File.

#5

jrbeeman - June 15, 2009 - 16:49

The attached patch adds in an option for rows by "Newline (Unix)" and it seems to cover my Mac Excel-generated CSVs alright.

AttachmentSize
node_import-447842-1.txt 1.36 KB

#6

friolator - August 4, 2009 - 16:00

TextWrangler for the mac (free, lite version of BBEdit) will let you change the linebreak types on the CSV file to DOS. flip a switch and save it, then import with node_import. Seems to work fine.

Mac Linebreaks should really be supported in node_import, though.

#7

dzepol - October 14, 2009 - 03:30

I also encountered this problem. My fix was saving as (in excel) "CSV (Windows)" instead of "CSV (Comma delimited)" on my Mac.

#8

deck_penguin - October 14, 2009 - 19:59

Also got me yesterday morning after updates. fastest solution for me was todos/fromdos commands (unix). simple tool to change EOL chars.

thanks community !

#9

quicksketch - October 16, 2009 - 17:18

I was experiencing the same thing (new lines not recognized) when exporting from Excel 2004 on Mac OS X.

My fix was saving as (in excel) "CSV (Windows)" instead of "CSV (Comma delimited)" on my Mac.

This definitely solved the problem. I don't think the patch in #5 should be directly added, instead it'd make sense just to run a str_replace("\r", "\n") on the text before trying to do the import (or similar munging). Even though this is a tool mostly for devs, they shouldn't have to match up the line endings when it's easy enough for Drupal to convert these line endings for you.

 
 

Drupal is a registered trademark of Dries Buytaert.