[CSV] Re: First Cut at CSV PEP
Skip Montanaro
skip at pobox.com
Wed Jan 29 03:01:01 CET 2003
>> Interesting point. I think that newlines inside records are going to
>> be the same as those separating records. Anything else would be very
>> bizarre.
Andrew> You should know better than to make a statement like that where
Andrew> Microsoft is concerned. Excel uses a single LF within fields,
Andrew> but CRLF at the end of lines. If you import a field containing
Andrew> CRLF, the CR appears within the field as a box (the "unprintable
Andrew> character" symbol).
Here's what I can figure out from the samples I saved in Excel today. I'm
away from the Windows machine now, so I can only infer the titles in the
save menu from the file names, so I may be a bit off in the associations.
Still, here goes:
File Type delimiter hard return line terminator
CSV comma LF CRLF
DOS Text TAB LF CRLF
DOS CSV comma LF CRLF
Mac Text TAB LF CR
Mac CSV comma LF CR
Space yow, this seems all screwed up!
TSV TAB LF CRLF
Unicode CSV comma LF CRLF
Unicode Text TAB LF CRLF
The Space-separated file looked pretty much like garbage. I'll have to
check it out more closely tomorrow. The Unicode CSV file was the same as
the DOS CSV and CSV files (same checksum). I was thus fairly surprised to
see that the Unicode Text file looked like it had been saved as UTF-16 -
each character is followed by an ASCII NUL and there is a little-endian
UTF-16 BOM at the start of the file.
The table suggests that Excel cares about Windows and Mac line endings, so
we should allow that to be a user-specified option. Unfortunately, that
means we have to tell people to open files in binary mode, since they will
be passing open file objects. Doesn't seem very clean to me. Any ideas?
Skip
More information about the Csv
mailing list