csv bugs

Skip Montanaro skip at pobox.com
Tue Mar 2 10:23:38 EST 2004


(A better place for this discussion would probably be csv at mail.mojam.com.
I'm adding it to the cc list.)

    Magnus> It seems that when a line termination is escaped (using the
    Magnus> current escape character), csv.reader treats it as a line
    Magnus> continuation, which is well an good -- but it doesn't discard
    Magnus> the escape character; instead, it escapes it implicitly. This
    Magnus> seems like a bug to me. E.g.

    Magnus>   foo:bar:baz\
    Magnus>   frozz:bozz

    Magnus> with separator ':' and escape character '\\' is parsed into

    Magnus>   ['foo', 'bar', 'baz\\\nfrozz', 'bozz']

    Magnus> In my opinion, it *ought* to be parsed into

    Magnus>   ['foo', 'bar', 'baz\nfrozz', 'bozz']

    Magnus> As far as I know, this is the UNIX convention, as used in (e.g.)
    Magnus> /etc/passwd.

That may be, however development of the csv module's parser was driven by
how Microsoft Excel behaves.  The assumption was (rightly I think) that
Excel reads or writes more CSV files than anything else.  I don't believe it
does anything with backslashes.

    Magnus> Am I off target here? If the current behaviour is desirable
    Magnus> (although I can't see why it should be) then at least I think
    Magnus> there should be a way of implementing "normal" line
    Magnus> continuations (as in my example), which is the standard UNIX
    Magnus> behavior, and the behavior of Python source, for that
    Magnus> matter. Otherwise, csv can't be used to parse (e.g.)
    Magnus> /etc/passwd...

You're welcome to submit a patch.  I don't have time for it.

    Magnus> And another thing: Perhaps a 'passwd' dialect could be added
    Magnus> alongside 'excel'? Something like:

    Magnus> class passwd(Dialect):
    Magnus>     delimiter = ':'
    Magnus>     doublequote = False
    Magnus>     escapechar = '\\'
    Magnus>     lineterminator = '\n'
    Magnus>     quotechar = '?'
    Magnus>     quoting = QUOTE_NONE
    Magnus>     skipinitialspace = False
    Magnus> register_dialect("passwd", passwd)

I'll take a look at that.

    Magnus> For some reason you *have* to supply a quotechar, even if you
    Magnus> set QUOTE_NONE... I guess that's a bug too, in my book.

Maybe.  Maybe just a feature.

    Magnus> If there are no objections, I might submit some of this as a bug
    Magnus> report or two (or even a patch).

Please do.

Skip




More information about the Python-list mailing list