[Python-ideas] csv dialect enhancement

Chris Angelico rosuav at gmail.com
Sun Jan 13 21:55:21 CET 2013


On Sat, Jan 12, 2013 at 4:16 AM, rurpy at yahoo.com <rurpy at yahoo.com> wrote:
> There is a common dialect of CSV, often used in database
> applications [*1], that distinguishes between an empty
> (quoted) string,
>
>   e.g., the second field in  "abc","",3
>
> and an empty field,
>
>   e.g., the second field in "abc",,3
>
> This distinction is needed to specify or tell the
> difference between 0-length strings and NULLs, when sending
> csv data to or receiving it from a database application.

Ugh, this is exactly the sort of thing that my boss didn't believe
happened. He thinks that CSV is the same the world over, except for a
few really old or arcane programs that can be completely ignored. Took
a lot of arguing before we agreed to disagree on that one...

As an explicitly-requestable dialect, looks good.

> Sniffer:
>  Will set "nulls" to True when both adjacent delimiters and
>  quoted empty strings are seen in the input text.
>  (Perhaps this behaviour needs to be optional for backward
>  compatibility reasons?)

Yes, and make it optional. I think the interpretation of ,,,, as empty
strings is the more common, since CSV is often used in contexts that
don't have a concept of NULL (spreadsheets mainly); this ought, then,
to be the default, but one quick option can add recognition of this.

So, +1 on the whole idea.

ChrisA



More information about the Python-ideas mailing list