[Tutor] Wanted: module to parse out a CSV line

andy surany mongo57a@comcast.net
Wed Dec 11 11:17:02 2002


Hello Magnus,

Is there an advantage to using something like asv or csv over opening
the file, reading each line, and using string.split (line_contents,
',')?

Thanks.

-Andy


-----Original Message-----
From: Magnus Lycka <magnus@thinkware.se>
To: Terry Carroll <carroll@tjc.com>; tutor@python.org <tutor@python.org>
Cc: djc@object-craft.com.au <djc@object-craft.com.au>
Date: Wednesday, December 11, 2002 5:33 AM
Subject: Re: [Tutor] Wanted: module to parse out a CSV line


>At 00:42 2002-12-11 -0800, Terry Carroll wrote:
>>I'm writing one of my first Python apps
>
>Welcome Terry! I hope you will enjoy it!
>
>(Dave, there is something here that looks like a bug in CSV to me.
>Care to comment?)
>
>>(I've used perl up  until
>>now) and need to parse out lines of comma-separated values (CSV).  I'm
on
>>a Windows/XP system.
>
>I was just about to suggest that you used one of the three
>modules below. It's nice to see that someone has made the
>effort to search the net before asking here! :)
>
>I'm afraid you should have tried a bit harder with these modules.
>They can all solve your problem (?), but maybe they could be a little
>better documented, and one of them could be in the standard library
>I think.
>
>>I've found some Python CSV support, but nothing that will work for me:
>
>>  1. ASV, from <http://tratt.net/laurie/python/asv/>
>>     Nice, but it reads in an entire file that is assumed to be
>>     CSV-formatted.
>
>There is an input_from_file method, but you don't have to use
>that. Use input instead.
>
>>  That's not my case, I have a single variable I need
>>     to parse out (yeah, it comes from a file, but not all lines in
the
>>     file are CSV).
>
> >>> import ASV
> >>> asv = ASV.ASV()
> >>> asv.input('A, 232, "Title", "Smith, Adam", "1, 2, 3, 4"',
ASV.CSV())
> >>> print asv
>[['A', '232', 'Title', 'Smith, Adam', '1, 2, 3, 4']]
>
>>  2. A CSV module from
>>     <http://www.object-craft.com.au/projects/csv/documentation.html>
>>     Perfect!  Exactly what I need.  Except the install fails looking
for a
>>     program named cl.exe; I think it's a compiler, which I don't
have.
>
>This module is implemented in C to make it really fast even
>for very large files. But look at the download page:
>http://www.object-craft.com.au/projects/csv/download.html
>
>If you are using Win32, you can use one of the following binaries:
>Win32 Python 2.1 binary: csv.pyd 20K Nov 20 2002
>Win32 Python 2.2 binary: csv.pyd 20K Nov 20 2002
>
> >>> import csv
> >>> csv.parser().parse('A, 232, "Title", "Smith, Adam", "1, 2, 3, 4"')
>['A', ' 232', ' "Title"', ' "Smith', ' Adam"', ' "1', ' 2', ' 3', '
4"']
>
>Not quite...but...
>
> >>> csv.parser().parse('A,232,"Title","Smith, Adam","1, 2, 3, 4"')
>['A', '232', 'Title', 'Smith, Adam', '1, 2, 3, 4']
>
>It seems the space after the comma confuses CSV regarding the use
>of double quotes. I've seen a lot of files with whitespace after
>the comma, so this is not what I would like. And the parser won't
>accept field_sep = ', ', it has to be a single character.
>
>>  3. Python-DSV, at <http://python-dsv.sourceforge.net/>
>>     This looks like some whole separate program, rather than
something
>>     that I can just call in to parse out a single line.  It also
looks
>>     like it goes after a whole file at once.  Hard to tell -- there's
no
>>     docs, unless (I presume) I install it.
>
>The documentation is in the form of a documentation string in the
source.
>It shows you what to do.
>
>Basic use:
>     from DSV import DSV
>     data = file.read() # file.read() returns a string, so this is what
you
>need
>     qualifier = DSV.guessTextQualifier(data) # optional
>     data = DSV.organizeIntoLines(data, textQualifier = qualifier)
>     delimiter = DSV.guessDelimiter(data) # optional
>     data = DSV.importDSV(data, delimiter = delimiter, textQualifier =
>qualifier)
>     hasHeader = DSV.guessHeaders(data) # optional
>
>You can skip the guessing games, and run the two functions that
matters.
>
> >>> from DSV import DSV
> >>> data = 'A, 232, "Title", "Smith, Adam", "1, 2, 3, 4"'
> >>> data = DSV.organizeIntoLines(data, textQualifier = '"')
> >>> data = DSV.importDSV(data, delimiter = ',', textQualifier = '"')
> >>> print data
>[['A', ' 232', 'Title', 'Smith, Adam', '1, 2, 3, 4']]
>
>As you see, like csv, but unlike asv, it won't strip the leading space
>from before 232. I'm pretty sure this is intentional. Whether it's a
>bug or a feature in your eyes is a different issue...
>
>The reason that the "organizeIntoLines" step (which you can bypass by
>putting your string in a list I guess) exists is because programs like
>Excel will produce CSV files with line breaks inside "-delimited
strings.
>So a logical line might span several physical lines.
>
>I think it would be a good thing to have parsers/importers/exporters
for
>both CSV (and fixed format) in the standard library. We just need some
>kind of consensus on how they should behave I guess...
>
>
>--
>Magnus Lycka, Thinkware AB
>Alvans vag 99, SE-907 50 UMEA, SWEDEN
>phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
>http://www.thinkware.se/  mailto:magnus@thinkware.se
>
>
>_______________________________________________
>Tutor maillist  -  Tutor@python.org
>http://mail.python.org/mailman/listinfo/tutor