First Cut at CSV PEP

Cliff Wells LogiplexSoftware at earthlink.net
Tue Jan 28 23:14:04 CET 2003


On Mon, 2003-01-27 at 22:43, Kevin Altis wrote:
> > I'm ready to toddle off to bed, so I'm stopping here for tonight.
> >  Attached
> > is what I've come up with so far in the way of a PEP.  Feel free to flesh
> > out, rewrite or add new sections.  After a brief amount of cycling, I'll
> > check it into CVS.
> 
> Probably need to specify that input and output deals with string
> representations, but there are some differences:
> 
> [[5,'Bob',None,1.0]]
> 
> DSV.exportCSV produces
> 
> '5,Bob,None,1.0'

Hm, that would be a bug in DSV =).  The None should have not been
exported (it doesn't have any meaning outside of Python).  However, only
quoting when necessary was lifted straight from Excel.  DSV also allows
a "quoteAll" option on export to change this behavior.

> Data that doesn't need quoting isn't quoted. Assuming those were spreadsheet
> values with the third item just an empty cell, then using Excel export rules
> would result in a default CSV of
> 
> 5,Bob,,1\r\n

This is the correct behavior.

> None is just an empty field. In Excel, the number 1.0 is just 1 in the
> exported file, but that may not matter, we can export 1.0 for the field.
> This reminds me that the boundary case of the last record just having EOF
> with no line ending should be tested.

Is this not handled correctly by all the existing implementations?

> Importing this line with importDSV for example yields a list of lists.
> 
> [['5', 'Bob', '', '1']]
> 
> Its debatable whether the third field should be None or an empty string.
> Further interpretation of each field becomes application-specific. The API
> makes it easy to do further processing as each row is read.

It's also debatable whether the numbers should have been returned as
strings or numbers.  I lean towards the former, as csv is a text format
and can't convey this sort of information by itself, which is why I
chose to return only strings, including the empty string for an empty
field rather than None.  I agree with Kevin that this is best left to
application logic rather than the module.

> I'm still not sure about some of the database CSV handling issues, often it
> seems they want a string field to be quoted regardless of whether it
> contains a comma or newlines, but number and empty field should not be
> quoted. It is certainly nice to be able to import a file that contains
> 
5,"Bob",,1.0\r\n
> 
> and not need to do any further translation. Excel appears to interpret
> quoted numbers and unquoted numbers as numeric fields when importing.

It treats them as if the user had typed them into a cell, which is not
necessarily the behavior we want.  To get a number as a string in Excel,
I imagine you'd have to have the following:

"""5""","Bob",,1.0\r\n

> 
> Just trying to be anal-retentive here to make sure all the issues are
> covered ;-)

And I thought it came naturally =)

> ka
-- 
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308  (800) 735-0555 x308




More information about the Csv mailing list