Alternatives for the CSV module

Sun Sep 12 12:30:47 EDT 2004

    >> I am going to make a program that reads files with different
    >> csv-dialects. Sometimes the field-separator or line-separator can be
    >> more than one character. The standard CSV module in Python 2.3 is not
    >> a good solution, because it expects single characters.

Well, I might disagree with you there.  By all reasonable accounts,
delimited files containing multi-character delimiters are not CSV files, at
least not as operationally defined by Excel (which I mention only because
it's probably the largest producer and consumer of such files).

    >> Example of a file

    >> "ABC"<>"DEF"""<>"GHI"¤¤123<>456<>"XYZ"¤¤

    >> Here the field delimiter is "<>" and the "line" terminator "¤¤".
    >> Fields can be enclosed in quotes, and a double qoute is treated as
    >> normal text.

    >> This is not the only format the parser can expect. The format is
    >> given to the program by the user, so the program should have no
    >> problems parsing the text. An ideal solution would be a similar
    >> parser to the standard CSV-parser, except that it accepts strings as
    >> delimiters.

    >> I could always manipulate the input file and replace the delimiters
    >> by single characters, but I would like a more generic solution.

That's pretty generic.  How about this (untested):

class DelimitedFile:
    def __init__(self, fname, mode='rb', ind=',', outd=','):
	self.f = open(fname, mode)
	self.ind = ind
	self.outd = outd

    def __iter__(self):
	return self

    def next(self):
        line = self.f.next()
	return line.replace(self.ind, self.outd)

Use it like so:

    import csv

    class d(csv.Excel):
        delimiter = '\001'
	lineterminator = '¤¤'

    reader = csv.reader(DelimitedFile(fname, ind='<>', outd='\001'),
	                dialect=d)

    for row in reader:
        print row

The goal is of course to pick a delimiter which won't appear in the file,
hence the Ctl-A.

Skip