Troubles with CSV file
Paul McGuire
ptmcg at austin.rr._bogus_.com
Fri May 14 11:22:42 EDT 2004
"Vladimir Ignatov" <vignatov at colorpilot.com> wrote in message
news:mailman.8.1084529146.4157.python-list at python.org...
> Hello!
>
> I have a big CSV file, which I must read and do some processing with it.
> Unfortunately I can't figure out how to use standard *csv* module in my
> situation. The problem is that some records look like:
>
> ""read this, man"", 1
>
> which should be decoded back into the:
>
> "read this, man"
> 1
>
> ... which is look pretty "natural" for me. Instead I got a:
>
> read this
> man""
> 1
>
> output. In other words, csv reader does not understand using of "" here.
> Quick experiment show me that *csv* module (with default 'excel' dialect)
> expects something like
>
> """read this, man""", 1
>
> in my situation - quotes actually must be trippled. I don't understand
this
> and can't figure out how to proceed with my CSV file. Maybe some
> *alternative* CSV parsers can help? Any suggestions are welcomed.
>
> Vladimir Ignatov
>
>
Vladimir -
Here is the CSV example that is provided with pyparsing (with some slight
edits). I wrote this for exactly the situation you describe - just
splitting on commas doesn't always do the right thing.
You can download pyparsing at http://pyparsing.sourceforge.net .
-- Paul
==========================
# commasep.py
#
# comma-separated list example, to illustrate the advantages of using
# the pyparsing commaSeparatedList as opposed to string.split(","):
# - leading and trailing whitespace is implicitly trimmed from list elements
# - list elements can be quoted strings, which can safely contain commas
without breaking
# into separate elements
from pyparsing import commaSeparatedList
import string
testData = [
"a,b,c,100.2,,3",
"d, e, j k , m ",
"'Hello, World', f, g , , 5.1,x",
"John Doe, 123 Main St., Cleveland, Ohio",
"Jane Doe, 456 St. James St., Los Angeles , California ",
"",
]
for line in testData:
print "input:", repr(line)
print "split:", line.split(",")
print "parse:", commaSeparatedList.parseString(line)
print
==========================
Output:
input: 'a,b,c,100.2,,3'
split: ['a', 'b', 'c', '100.2', '', '3']
parse: ['a', 'b', 'c', '100.2', '', '3']
input: 'd, e, j k , m '
split: ['d', ' e', ' j k ', ' m ']
parse: ['d', 'e', 'j k', 'm']
input: "'Hello, World', f, g , , 5.1,x"
split: ["'Hello", " World'", ' f', ' g ', ' ', ' 5.1', 'x']
parse: ["'Hello, World'", 'f', 'g', '', '5.1', 'x']
input: 'John Doe, 123 Main St., Cleveland, Ohio'
split: ['John Doe', ' 123 Main St.', ' Cleveland', ' Ohio']
parse: ['John Doe', '123 Main St.', 'Cleveland', 'Ohio']
input: 'Jane Doe, 456 St. James St., Los Angeles , California '
split: ['Jane Doe', ' 456 St. James St.', ' Los Angeles ', ' California ']
parse: ['Jane Doe', '456 St. James St.', 'Los Angeles', 'California']
input: ''
split: ['']
parse: ['']
More information about the Python-list
mailing list