[Tutor] Wanted: module to parse out a CSV line

Dave Cole djc@object-craft.com.au
Wed Dec 11 13:20:03 2002


> At 00:42 2002-12-11 -0800, Terry Carroll wrote:
> >I'm writing one of my first Python apps
> 
> Welcome Terry! I hope you will enjoy it!
> 
> (Dave, there is something here that looks like a bug in CSV to me.
> Care to comment?)

We consider what Excel exports to be the definitive statement of what
our parser should be handling.  Do you know if Excel ever exports
files with the ', ' separator?

Another question is what does Excel do when you import that data?

> I'm afraid you should have tried a bit harder with these modules.
> They can all solve your problem (?), but maybe they could be a little
> better documented, and one of them could be in the standard library
> I think.

Which one?  :-)

>  >>> import ASV
>  >>> asv = ASV.ASV()
>  >>> asv.input('A, 232, "Title", "Smith, Adam", "1, 2, 3, 4"', ASV.CSV())
>  >>> print asv
> [['A', '232', 'Title', 'Smith, Adam', '1, 2, 3, 4']]

It seems to be doing what you expect.

>  >>> import csv
>  >>> csv.parser().parse('A, 232, "Title", "Smith, Adam", "1, 2, 3, 4"')
> ['A', ' 232', ' "Title"', ' "Smith', ' Adam"', ' "1', ' 2', ' 3', ' 4"']
> 
> Not quite...but...
> 
>  >>> csv.parser().parse('A,232,"Title","Smith, Adam","1, 2, 3, 4"')
> ['A', '232', 'Title', 'Smith, Adam', '1, 2, 3, 4']
> 
> It seems the space after the comma confuses CSV regarding the use
> of double quotes. I've seen a lot of files with whitespace after
> the comma, so this is not what I would like. And the parser won't
> accept field_sep = ', ', it has to be a single character.

Have you checked what Excel does with that data?

> The reason that the "organizeIntoLines" step (which you can bypass by
> putting your string in a list I guess) exists is because programs like
> Excel will produce CSV files with line breaks inside "-delimited strings.
> So a logical line might span several physical lines.

Our CSV parser handles the records split over multiple lines thing.

If it turns out that Excel exports the ', ' separator then we would
absolutely consider our module to have a bug.

If Excel never exports files with ', ', but imports files with the
', ' separator differently to ours then we would seriously consider
changing our module to behave the same way.

>From our point of view we want to be absolutely sure that anything
Excel exports will be correctly parsed by our module.  It slightly
less important to duplicate the way that data is imported by Excel.

q> I think it would be a good thing to have parsers/importers/exporters for
> both CSV (and fixed format) in the standard library. We just need some
> kind of consensus on how they should behave I guess...

I agree.  Agreeing with what those parsers should do is always the
problem...

- Dave

-- 
http://www.object-craft.com.au