[Tutor] Wanted: module to parse out a CSV line
andy surany
mongo57a@comcast.net
Wed Dec 11 11:17:02 2002
Hello Magnus,
Is there an advantage to using something like asv or csv over opening
the file, reading each line, and using string.split (line_contents,
',')?
Thanks.
-Andy
-----Original Message-----
From: Magnus Lycka <magnus@thinkware.se>
To: Terry Carroll <carroll@tjc.com>; tutor@python.org <tutor@python.org>
Cc: djc@object-craft.com.au <djc@object-craft.com.au>
Date: Wednesday, December 11, 2002 5:33 AM
Subject: Re: [Tutor] Wanted: module to parse out a CSV line
>At 00:42 2002-12-11 -0800, Terry Carroll wrote:
>>I'm writing one of my first Python apps
>
>Welcome Terry! I hope you will enjoy it!
>
>(Dave, there is something here that looks like a bug in CSV to me.
>Care to comment?)
>
>>(I've used perl up until
>>now) and need to parse out lines of comma-separated values (CSV). I'm
on
>>a Windows/XP system.
>
>I was just about to suggest that you used one of the three
>modules below. It's nice to see that someone has made the
>effort to search the net before asking here! :)
>
>I'm afraid you should have tried a bit harder with these modules.
>They can all solve your problem (?), but maybe they could be a little
>better documented, and one of them could be in the standard library
>I think.
>
>>I've found some Python CSV support, but nothing that will work for me:
>
>> 1. ASV, from <http://tratt.net/laurie/python/asv/>
>> Nice, but it reads in an entire file that is assumed to be
>> CSV-formatted.
>
>There is an input_from_file method, but you don't have to use
>that. Use input instead.
>
>> That's not my case, I have a single variable I need
>> to parse out (yeah, it comes from a file, but not all lines in
the
>> file are CSV).
>
> >>> import ASV
> >>> asv = ASV.ASV()
> >>> asv.input('A, 232, "Title", "Smith, Adam", "1, 2, 3, 4"',
ASV.CSV())
> >>> print asv
>[['A', '232', 'Title', 'Smith, Adam', '1, 2, 3, 4']]
>
>> 2. A CSV module from
>> <http://www.object-craft.com.au/projects/csv/documentation.html>
>> Perfect! Exactly what I need. Except the install fails looking
for a
>> program named cl.exe; I think it's a compiler, which I don't
have.
>
>This module is implemented in C to make it really fast even
>for very large files. But look at the download page:
>http://www.object-craft.com.au/projects/csv/download.html
>
>If you are using Win32, you can use one of the following binaries:
>Win32 Python 2.1 binary: csv.pyd 20K Nov 20 2002
>Win32 Python 2.2 binary: csv.pyd 20K Nov 20 2002
>
> >>> import csv
> >>> csv.parser().parse('A, 232, "Title", "Smith, Adam", "1, 2, 3, 4"')
>['A', ' 232', ' "Title"', ' "Smith', ' Adam"', ' "1', ' 2', ' 3', '
4"']
>
>Not quite...but...
>
> >>> csv.parser().parse('A,232,"Title","Smith, Adam","1, 2, 3, 4"')
>['A', '232', 'Title', 'Smith, Adam', '1, 2, 3, 4']
>
>It seems the space after the comma confuses CSV regarding the use
>of double quotes. I've seen a lot of files with whitespace after
>the comma, so this is not what I would like. And the parser won't
>accept field_sep = ', ', it has to be a single character.
>
>> 3. Python-DSV, at <http://python-dsv.sourceforge.net/>
>> This looks like some whole separate program, rather than
something
>> that I can just call in to parse out a single line. It also
looks
>> like it goes after a whole file at once. Hard to tell -- there's
no
>> docs, unless (I presume) I install it.
>
>The documentation is in the form of a documentation string in the
source.
>It shows you what to do.
>
>Basic use:
> from DSV import DSV
> data = file.read() # file.read() returns a string, so this is what
you
>need
> qualifier = DSV.guessTextQualifier(data) # optional
> data = DSV.organizeIntoLines(data, textQualifier = qualifier)
> delimiter = DSV.guessDelimiter(data) # optional
> data = DSV.importDSV(data, delimiter = delimiter, textQualifier =
>qualifier)
> hasHeader = DSV.guessHeaders(data) # optional
>
>You can skip the guessing games, and run the two functions that
matters.
>
> >>> from DSV import DSV
> >>> data = 'A, 232, "Title", "Smith, Adam", "1, 2, 3, 4"'
> >>> data = DSV.organizeIntoLines(data, textQualifier = '"')
> >>> data = DSV.importDSV(data, delimiter = ',', textQualifier = '"')
> >>> print data
>[['A', ' 232', 'Title', 'Smith, Adam', '1, 2, 3, 4']]
>
>As you see, like csv, but unlike asv, it won't strip the leading space
>from before 232. I'm pretty sure this is intentional. Whether it's a
>bug or a feature in your eyes is a different issue...
>
>The reason that the "organizeIntoLines" step (which you can bypass by
>putting your string in a list I guess) exists is because programs like
>Excel will produce CSV files with line breaks inside "-delimited
strings.
>So a logical line might span several physical lines.
>
>I think it would be a good thing to have parsers/importers/exporters
for
>both CSV (and fixed format) in the standard library. We just need some
>kind of consensus on how they should behave I guess...
>
>
>--
>Magnus Lycka, Thinkware AB
>Alvans vag 99, SE-907 50 UMEA, SWEDEN
>phone: int+46 70 582 80 65, fax: int+46 70 612 80 65
>http://www.thinkware.se/ mailto:magnus@thinkware.se
>
>
>_______________________________________________
>Tutor maillist - Tutor@python.org
>http://mail.python.org/mailman/listinfo/tutor