Mailman 3 sys.setdefaultencoding() vs. csv module + unicode - Python-Dev

13 Jun 2007

      ...
...
...
import sys, csv, codecs
f = codecs.open('unicsv.csv','wb','utf-8')
w = csv.writer(f)
w.writerow([u'lang', u'espa\xa4ol'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa4' in
I'm seeing conflicting opinions on whether to put
sys.setdefaultencoding('utf-8') in sitecustomize.py or not ([1] vs.
[2]) and frankly I'm confused.

The csv module says it's not unicode safe but the 2.5 docs [3] have a
workaround for this.  While the workaround says nothing about
sys.setdefaultencoding() it simply does not work with the default
encoding, "ascii."  Is this _the_ problem with the csv module?  Should
I give up and use XML?  Below is code that works vs. code that
doesn't.  Am I interpretting the workaround from the docs wrong?  If
so, can someone please give me a hint ;)  I should also point out that
I've tried this with the StringIO queued approach (from the
workaround) but that doesn't solve anything.

1) with the default encoding :

kumar$ python2.5
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
position 4: ordinal not in range(128)
...
...
...
2) with custom encoding :

kumar$ python2.5 -S
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
...
...
...
import sys, csv, codecs
sys.setdefaultencoding('utf-8')
f = codecs.open('unicsv.csv','wb','utf-8')
w = csv.writer(f)
w.writerow([u'lang', u'espa\xa4ol'])
f.close()
thanks, Kumar

[1] http://mail.python.org/pipermail/python-dev/2007-June/073593.html
[2] http://diveintopython.org/xml_processing/unicode.html
[3] http://docs.python.org/lib/csv-examples.html#csv-examples

sys.setdefaultencoding() vs. csv module + unicode

Kumar McMillan

"Martin v. Löwis"

tags

participants (2)