[Python-Dev] sys.setdefaultencoding() vs. csv module + unicode
"Martin v. Löwis"
martin at v.loewis.de
Thu Jun 14 08:47:43 CEST 2007
> The csv module says it's not unicode safe but the 2.5 docs [3] have a
> workaround for this. While the workaround says nothing about
> sys.setdefaultencoding() it simply does not work with the default
> encoding, "ascii." Is this _the_ problem with the csv module? Should
> I give up and use XML? Below is code that works vs. code that
> doesn't. Am I interpretting the workaround from the docs wrong?
These questions are off-topic for python-dev; please ask them on
comp.lang.python instead. python-dev is for the development *of*
Python, not for the development *with* Python.
> kumar$ python2.5
> Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
> [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
>>>> import sys, csv, codecs
>>>> f = codecs.open('unicsv.csv','wb','utf-8')
>>>> w = csv.writer(f)
>>>> w.writerow([u'lang', u'espa\xa4ol'])
What you should do here is
def encoderow(r):
return [s.encode("utf-8") for s in r])
f = open('unicsv.csv', 'wb', 'utf-8')
w = csv.writer(f)
w.writerow(encoderow([u'lang', u'espa\xa4ol'])
IOW, you need to encode *before* passing the strings
to the CSV module, not afterwards.
If it is too tedious for you to put in the encoderow
calls all the time, you can write a wrapper for CSV
writers which transparently encodes all Unicode
strings.
Regards,
Martin
More information about the Python-Dev
mailing list