First Cut at CSV PEP
Cliff Wells
LogiplexSoftware at earthlink.net
Tue Jan 28 23:14:04 CET 2003
On Mon, 2003-01-27 at 22:43, Kevin Altis wrote:
> > I'm ready to toddle off to bed, so I'm stopping here for tonight.
> > Attached
> > is what I've come up with so far in the way of a PEP. Feel free to flesh
> > out, rewrite or add new sections. After a brief amount of cycling, I'll
> > check it into CVS.
>
> Probably need to specify that input and output deals with string
> representations, but there are some differences:
>
> [[5,'Bob',None,1.0]]
>
> DSV.exportCSV produces
>
> '5,Bob,None,1.0'
Hm, that would be a bug in DSV =). The None should have not been
exported (it doesn't have any meaning outside of Python). However, only
quoting when necessary was lifted straight from Excel. DSV also allows
a "quoteAll" option on export to change this behavior.
> Data that doesn't need quoting isn't quoted. Assuming those were spreadsheet
> values with the third item just an empty cell, then using Excel export rules
> would result in a default CSV of
>
> 5,Bob,,1\r\n
This is the correct behavior.
> None is just an empty field. In Excel, the number 1.0 is just 1 in the
> exported file, but that may not matter, we can export 1.0 for the field.
> This reminds me that the boundary case of the last record just having EOF
> with no line ending should be tested.
Is this not handled correctly by all the existing implementations?
> Importing this line with importDSV for example yields a list of lists.
>
> [['5', 'Bob', '', '1']]
>
> Its debatable whether the third field should be None or an empty string.
> Further interpretation of each field becomes application-specific. The API
> makes it easy to do further processing as each row is read.
It's also debatable whether the numbers should have been returned as
strings or numbers. I lean towards the former, as csv is a text format
and can't convey this sort of information by itself, which is why I
chose to return only strings, including the empty string for an empty
field rather than None. I agree with Kevin that this is best left to
application logic rather than the module.
> I'm still not sure about some of the database CSV handling issues, often it
> seems they want a string field to be quoted regardless of whether it
> contains a comma or newlines, but number and empty field should not be
> quoted. It is certainly nice to be able to import a file that contains
>
5,"Bob",,1.0\r\n
>
> and not need to do any further translation. Excel appears to interpret
> quoted numbers and unquoted numbers as numeric fields when importing.
It treats them as if the user had typed them into a cell, which is not
necessarily the behavior we want. To get a number as a string in Excel,
I imagine you'd have to have the following:
"""5""","Bob",,1.0\r\n
>
> Just trying to be anal-retentive here to make sure all the issues are
> covered ;-)
And I thought it came naturally =)
> ka
--
Cliff Wells, Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308 (800) 735-0555 x308
More information about the Csv
mailing list