csv module and unicode, when or workaround?
Skip Montanaro
skip at pobox.com
Sat Mar 12 00:25:11 EST 2005
Chris> the current csv module cannot handle unicode the docs say, is
Chris> there any workaround or is unicode support planned for the near
Chris> future?
Skip> True, it can't.
Hmmm... I think the following should be a reasonable workaround in most
situations:
#!/usr/bin/env python
import csv
class UnicodeReader:
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
self.reader = csv.reader(f, dialect=dialect, **kwds)
self.encoding = encoding
def next(self):
row = self.reader.next()
return [unicode(s, self.encoding) for s in row]
def __iter__(self):
return self
class UnicodeWriter:
def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
self.writer = csv.writer(f, dialect=dialect, **kwds)
self.encoding = encoding
def writerow(self, row):
self.writer.writerow([s.encode("utf-8") for s in row])
def writerows(self, rows):
for row in rows:
self.writerow(row)
if __name__ == "__main__":
try:
oldurow = [u'\u65E5\u672C\u8A9E',
u'Hi Mom -\u263a-!',
u'A\u2262\u0391.']
writer = UnicodeWriter(open("uni.csv", "wb"))
writer.writerow(oldurow)
del writer
reader = UnicodeReader(open("uni.csv", "rb"))
newurow = reader.next()
print "trivial test", newurow == oldurow and "passed" or "failed"
finally:
import os
os.unlink("uni.csv")
If people don't find any egregious flaws with the concept I'll at least add
it as an example to the csv module docs. Maybe they would even work as
additions to the csv.py module, assuming the api is palatable.
Skip
More information about the Python-list
mailing list