sys.setdefaultencoding() vs. csv module + unicode
import sys, csv, codecs f = codecs.open('unicsv.csv','wb','utf-8') w = csv.writer(f) w.writerow([u'lang', u'espa\xa4ol']) Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa4' in
I'm seeing conflicting opinions on whether to put sys.setdefaultencoding('utf-8') in sitecustomize.py or not ([1] vs. [2]) and frankly I'm confused. The csv module says it's not unicode safe but the 2.5 docs [3] have a workaround for this. While the workaround says nothing about sys.setdefaultencoding() it simply does not work with the default encoding, "ascii." Is this _the_ problem with the csv module? Should I give up and use XML? Below is code that works vs. code that doesn't. Am I interpretting the workaround from the docs wrong? If so, can someone please give me a hint ;) I should also point out that I've tried this with the StringIO queued approach (from the workaround) but that doesn't solve anything. 1) with the default encoding : kumar$ python2.5 Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin position 4: ordinal not in range(128)
2) with custom encoding : kumar$ python2.5 -S Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
import sys, csv, codecs sys.setdefaultencoding('utf-8') f = codecs.open('unicsv.csv','wb','utf-8') w = csv.writer(f) w.writerow([u'lang', u'espa\xa4ol']) f.close()
thanks, Kumar [1] http://mail.python.org/pipermail/python-dev/2007-June/073593.html [2] http://diveintopython.org/xml_processing/unicode.html [3] http://docs.python.org/lib/csv-examples.html#csv-examples
The csv module says it's not unicode safe but the 2.5 docs [3] have a workaround for this. While the workaround says nothing about sys.setdefaultencoding() it simply does not work with the default encoding, "ascii." Is this _the_ problem with the csv module? Should I give up and use XML? Below is code that works vs. code that doesn't. Am I interpretting the workaround from the docs wrong?
These questions are off-topic for python-dev; please ask them on comp.lang.python instead. python-dev is for the development *of* Python, not for the development *with* Python.
kumar$ python2.5 Python 2.5 (r25:51918, Sep 19 2006, 08:49:13) [GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
import sys, csv, codecs f = codecs.open('unicsv.csv','wb','utf-8') w = csv.writer(f) w.writerow([u'lang', u'espa\xa4ol'])
What you should do here is def encoderow(r): return [s.encode("utf-8") for s in r]) f = open('unicsv.csv', 'wb', 'utf-8') w = csv.writer(f) w.writerow(encoderow([u'lang', u'espa\xa4ol']) IOW, you need to encode *before* passing the strings to the CSV module, not afterwards. If it is too tedious for you to put in the encoderow calls all the time, you can write a wrapper for CSV writers which transparently encodes all Unicode strings. Regards, Martin
participants (2)
-
"Martin v. Löwis"
-
Kumar McMillan