csv module and unicode, when or workaround?

Skip Montanaro skip at pobox.com
Sat Mar 12 00:25:11 EST 2005


    Chris> the current csv module cannot handle unicode the docs say, is
    Chris> there any workaround or is unicode support planned for the near
    Chris> future?

    Skip> True, it can't.

Hmmm...  I think the following should be a reasonable workaround in most
situations:

    #!/usr/bin/env python

    import csv

    class UnicodeReader:
        def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
            self.reader = csv.reader(f, dialect=dialect, **kwds)
            self.encoding = encoding

        def next(self):
            row = self.reader.next()
            return [unicode(s, self.encoding) for s in row]

        def __iter__(self):
            return self

    class UnicodeWriter:
        def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
            self.writer = csv.writer(f, dialect=dialect, **kwds)
            self.encoding = encoding

        def writerow(self, row):
            self.writer.writerow([s.encode("utf-8") for s in row])

        def writerows(self, rows):
            for row in rows:
                self.writerow(row)

    if __name__ == "__main__":
        try:
            oldurow = [u'\u65E5\u672C\u8A9E',
                       u'Hi Mom -\u263a-!',
                       u'A\u2262\u0391.']
            writer = UnicodeWriter(open("uni.csv", "wb"))
            writer.writerow(oldurow)
            del writer

            reader = UnicodeReader(open("uni.csv", "rb"))
            newurow = reader.next()
            print "trivial test", newurow == oldurow and "passed" or "failed"
        finally:
            import os
            os.unlink("uni.csv")

If people don't find any egregious flaws with the concept I'll at least add
it as an example to the csv module docs.  Maybe they would even work as
additions to the csv.py module, assuming the api is palatable.

Skip



More information about the Python-list mailing list