Troubles with CSV file

Paul McGuire ptmcg at austin.rr._bogus_.com
Fri May 14 11:22:42 EDT 2004


"Vladimir Ignatov" <vignatov at colorpilot.com> wrote in message
news:mailman.8.1084529146.4157.python-list at python.org...
> Hello!
>
> I have a big CSV file, which I must read and do some processing with it.
> Unfortunately I can't figure out how to use standard *csv* module in my
> situation. The problem is that some records look like:
>
> ""read this, man"", 1
>
> which should be decoded back into the:
>
>     "read this, man"
>     1
>
> ... which is look pretty "natural" for me. Instead I got a:
>
>     read this
>       man""
>    1
>
> output. In other words, csv reader does not understand using of "" here.
> Quick experiment show me that *csv* module (with default 'excel' dialect)
> expects something like
>
>      """read this, man""", 1
>
> in my situation  - quotes actually must be trippled. I don't understand
this
> and can't figure out how to proceed with my CSV file. Maybe some
> *alternative* CSV parsers can help?  Any suggestions are welcomed.
>
>     Vladimir Ignatov
>
>
Vladimir -

Here is the CSV example that is provided with pyparsing (with some slight
edits).  I wrote this for exactly the situation you describe - just
splitting on commas doesn't always do the right thing.

You can download pyparsing at http://pyparsing.sourceforge.net .

-- Paul

==========================
# commasep.py
#
# comma-separated list example, to illustrate the advantages of using
# the pyparsing commaSeparatedList as opposed to string.split(","):
# - leading and trailing whitespace is implicitly trimmed from list elements
# - list elements can be quoted strings, which can safely contain commas
without breaking
#    into separate elements

from pyparsing import commaSeparatedList
import string

testData = [
    "a,b,c,100.2,,3",
    "d, e, j k , m  ",
    "'Hello, World', f, g , , 5.1,x",
    "John Doe, 123 Main St., Cleveland, Ohio",
    "Jane Doe, 456 St. James St., Los Angeles , California   ",
    "",
    ]

for line in testData:
    print "input:", repr(line)
    print "split:", line.split(",")
    print "parse:", commaSeparatedList.parseString(line)
    print

==========================
Output:
input: 'a,b,c,100.2,,3'
split: ['a', 'b', 'c', '100.2', '', '3']
parse: ['a', 'b', 'c', '100.2', '', '3']

input: 'd, e, j k , m  '
split: ['d', ' e', ' j k ', ' m  ']
parse: ['d', 'e', 'j k', 'm']

input: "'Hello, World', f, g , , 5.1,x"
split: ["'Hello", " World'", ' f', ' g ', ' ', ' 5.1', 'x']
parse: ["'Hello, World'", 'f', 'g', '', '5.1', 'x']

input: 'John Doe, 123 Main St., Cleveland, Ohio'
split: ['John Doe', ' 123 Main St.', ' Cleveland', ' Ohio']
parse: ['John Doe', '123 Main St.', 'Cleveland', 'Ohio']

input: 'Jane Doe, 456 St. James St., Los Angeles , California   '
split: ['Jane Doe', ' 456 St. James St.', ' Los Angeles ', ' California   ']
parse: ['Jane Doe', '456 St. James St.', 'Los Angeles', 'California']

input: ''
split: ['']
parse: ['']






More information about the Python-list mailing list