Escaping commas within parens in CSV parsing?
Skip Montanaro
skip at pobox.com
Thu Jun 30 22:59:10 EDT 2005
Ramon> I am trying to use the csv module to parse a column of values
Ramon> containing comma-delimited values with unusual escaping:
Ramon> AAA, BBB, CCC (some text, right here), DDD
Ramon> I want this to come back as:
Ramon> ["AAA", "BBB", "CCC (some text, right here)", "DDD"]
Alas, there's no "escaping" at all in the line above. I see no obvious way
to distinguish one comma from another in this example. If you mean the fact
that the comma you want to retain is in parens, that's not escaping. Escape
characters don't appear in the output as they do in your example.
Ramon> I can probably hack this with regular expressions but I thought
Ramon> I'd check to see if anyone had any quick suggestions for how to
Ramon> do this elegantly first.
I see nothing obvious unless you truly mean that the beginning of each field
is all caps. In that case you could wrap a file object and :
import re
class FunnyWrapper:
"""untested"""
def __init__(self, f):
self.f = f
def __iter__(self):
return self
def next(self):
return '"' + re.sub(r',( *[A-Z]+)', r'","\1', self.f.next()) + '"'
and use it like so:
reader = csv.reader(FunnyWrapper(open("somefile.csv", "rb")))
for row in reader:
print row
(I'm not sure what the ramifications are of iterating over a file opened in
binary mode.)
Skip
More information about the Python-list
mailing list