parsing CSV files with quotes
Warren Postma
embed at geocities.com
Thu Mar 30 11:50:27 EST 2000
Suppose I have a CSV file where line 1 is the column names, and lines 2..n
are comma separated variables, where all String fields are quoted like this:
ID, NAME, AGE
1, "Postma, Warren", 30
2, "Twain, Shania", 31
3, "Nelson, Willy", 57
4, "Austin, \"Stone Cold\" Steve", 34
So, the obvious thing I tried is:
import string
>>> print string.splitfields("4, \"Austin, \\\"Stone Cold\\\" Steve,
34",",")
['4', ' "Austin', ' \\"Stone Cold\\" Steve', ' 34']
Hmm. Interesting. So I tried this:
>>> print string.splitfields(r'4, "Austin, \"Stone Cold\" Steve", 34')
['4,', '"Austin,', '\\"Stone', 'Cold\\"', 'Steve",', '34']
I'm getting close, I can feel it!
The Rules:
1. All integer and other fields are output as ascii.
2. String fields have quotes. Commas are allowed inside the quotes.
3. Quotes inside quotes are escaped by a backslash
4. Backslashes are themselves quoted by a backslash
Is this complex enough that I basically need the "parser" module of Python?
Problem is I'm scared of it. Anyone got any Parser Tutorials Howtos/Links?
Or is this beasty solveable by judicious use of Regular Expressions?
While I'm taking up bandwidth, I'll ask another silly question:
Is there a "compressed dbShelve" out there anywhere? In this case I just
want to store arrays and dictionaries of built-in Python types, in a
compressed manner, in a bsd database. Anyone heard of something like this?
Warren
More information about the Python-list
mailing list