[CSV] Re: First Cut at CSV PEP

Skip Montanaro skip at pobox.com
Wed Jan 29 03:01:01 CET 2003


    >> Interesting point.  I think that newlines inside records are going to
    >> be the same as those separating records.  Anything else would be very
    >> bizarre.

    Andrew> You should know better than to make a statement like that where
    Andrew> Microsoft is concerned. Excel uses a single LF within fields,
    Andrew> but CRLF at the end of lines. If you import a field containing
    Andrew> CRLF, the CR appears within the field as a box (the "unprintable
    Andrew> character" symbol).

Here's what I can figure out from the samples I saved in Excel today.  I'm
away from the Windows machine now, so I can only infer the titles in the
save menu from the file names, so I may be a bit off in the associations.
Still, here goes:

    File Type           delimiter       hard return     line terminator
    CSV                 comma           LF              CRLF
    DOS Text            TAB             LF              CRLF
    DOS CSV             comma           LF              CRLF
    Mac Text            TAB             LF              CR
    Mac CSV             comma           LF              CR
    Space               yow, this seems all screwed up!
    TSV                 TAB             LF              CRLF
    Unicode CSV         comma           LF              CRLF
    Unicode Text        TAB             LF              CRLF

The Space-separated file looked pretty much like garbage.  I'll have to
check it out more closely tomorrow.  The Unicode CSV file was the same as
the DOS CSV and CSV files (same checksum).  I was thus fairly surprised to
see that the Unicode Text file looked like it had been saved as UTF-16 -
each character is followed by an ASCII NUL and there is a little-endian
UTF-16 BOM at the start of the file.

The table suggests that Excel cares about Windows and Mac line endings, so
we should allow that to be a user-specified option.  Unfortunately, that
means we have to tell people to open files in binary mode, since they will
be passing open file objects.  Doesn't seem very clean to me.  Any ideas?

Skip



More information about the Csv mailing list