Alternatives for the CSV module
Skip Montanaro
skip at pobox.com
Sun Sep 12 12:30:47 EDT 2004
>> I am going to make a program that reads files with different
>> csv-dialects. Sometimes the field-separator or line-separator can be
>> more than one character. The standard CSV module in Python 2.3 is not
>> a good solution, because it expects single characters.
Well, I might disagree with you there. By all reasonable accounts,
delimited files containing multi-character delimiters are not CSV files, at
least not as operationally defined by Excel (which I mention only because
it's probably the largest producer and consumer of such files).
>> Example of a file
>> "ABC"<>"DEF"""<>"GHI"¤¤123<>456<>"XYZ"¤¤
>> Here the field delimiter is "<>" and the "line" terminator "¤¤".
>> Fields can be enclosed in quotes, and a double qoute is treated as
>> normal text.
>> This is not the only format the parser can expect. The format is
>> given to the program by the user, so the program should have no
>> problems parsing the text. An ideal solution would be a similar
>> parser to the standard CSV-parser, except that it accepts strings as
>> delimiters.
>> I could always manipulate the input file and replace the delimiters
>> by single characters, but I would like a more generic solution.
That's pretty generic. How about this (untested):
class DelimitedFile:
def __init__(self, fname, mode='rb', ind=',', outd=','):
self.f = open(fname, mode)
self.ind = ind
self.outd = outd
def __iter__(self):
return self
def next(self):
line = self.f.next()
return line.replace(self.ind, self.outd)
Use it like so:
import csv
class d(csv.Excel):
delimiter = '\001'
lineterminator = '¤¤'
reader = csv.reader(DelimitedFile(fname, ind='<>', outd='\001'),
dialect=d)
for row in reader:
print row
The goal is of course to pick a delimiter which won't appear in the file,
hence the Ctl-A.
Skip
More information about the Python-list
mailing list