[Python-Dev] Example workaround classes for using Unicode with csv module...

Skip Montanaro skip at pobox.com
Fri Mar 18 18:06:53 CET 2005

I added UnicodeReader and UnicodeWriter example classes to the csv module
docs just now.  They mention problems with ASCII NUL characters (which I
vaguely remember - NUL-terminated strings are used internally, right?).  Do
NULs still present a problem?  I saw nothing in the log messages that
mentioned "ascii" or "nul" so I presume it is.

Here's what I added.  Let me know if you think it needs any corrections,
especially if there's a better way to word "as long as you avoid encodings
like utf-16 that use NULs".  Can that just be "as long as you avoid
multi-byte encodings other than utf-8"?  I'd like to have something like
this in the docs to demonstrate a reasonable workaround for the current
no-Unicode code without casting it in stone by adding it to csv.py.

The \module{csv} module doesn't directly support reading and writing
Unicode, but it is 8-bit clean save for some problems with \ASCII{} NUL
characters, so you can write classes that handle the encoding and decoding
for you as long as you avoid encodings like utf-16 that use NULs.

import csv

class UnicodeReader:
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        self.reader = csv.reader(f, dialect=dialect, **kwds)
        self.encoding = encoding

    def next(self):
        row = self.reader.next()
        return [unicode(s, self.encoding) for s in row]

    def __iter__(self):
        return self

class UnicodeWriter:
    def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
        self.writer = csv.writer(f, dialect=dialect, **kwds)
        self.encoding = encoding

    def writerow(self, row):
        self.writer.writerow([s.encode("utf-8") for s in row])

    def writerows(self, rows):
        for row in rows:

They should work just like the \class{csv.reader} and \class{csv.writer}
classes but add an \var{encoding} parameter.



More information about the Python-Dev mailing list