parsing CSV files with quotes

Niklas Frykholm r2d2 at acc.umu.se
Fri Mar 31 03:04:08 EST 2000


On Thu, 30 Mar 2000 11:50:27 -0500, Warren Postma <embed at geocities.com> wrote:
>Suppose I have a CSV file where line 1 is the column names, and lines 2..n
>are comma separated variables, where all String fields are quoted like this:
>
>ID, NAME, AGE
>1, "Postma, Warren", 30
>2, "Twain, Shania",  31
>3, "Nelson, Willy",  57
>4, "Austin, \"Stone Cold\" Steve", 34

[...]

>Or is this beasty solveable by judicious use of Regular Expressions?

Sure... 

def dec(s):
    "Decode \\" and \\\\."
    return re.sub(r"\\(.)", r"\1", s)

def csv_parse(s):
    "Parse csv-string."
    return map(lambda x: (x[0] and [int(x[0])]  or [dec(x[1])])[0],
            re.findall(r'\s*([^"]+?)\s*,|\s*"(.*?[^\\])"\s*,',s+","))

>>>s = r'4, "Austin, \"Stone Cold\" Steve", 34'
>>>print csv_parse(s)
[4, 'Austin, "Stone Cold" Steve', 34]

// Niklas



More information about the Python-list mailing list