[Python-Dev] sys.setdefaultencoding() vs. csv module + unicode

Kumar McMillan kumar.mcmillan at gmail.com
Wed Jun 13 22:30:57 CEST 2007


I'm seeing conflicting opinions on whether to put
sys.setdefaultencoding('utf-8') in sitecustomize.py or not ([1] vs.
[2]) and frankly I'm confused.

The csv module says it's not unicode safe but the 2.5 docs [3] have a
workaround for this.  While the workaround says nothing about
sys.setdefaultencoding() it simply does not work with the default
encoding, "ascii."  Is this _the_ problem with the csv module?  Should
I give up and use XML?  Below is code that works vs. code that
doesn't.  Am I interpretting the workaround from the docs wrong?  If
so, can someone please give me a hint ;)  I should also point out that
I've tried this with the StringIO queued approach (from the
workaround) but that doesn't solve anything.

1) with the default encoding :

kumar$ python2.5
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
>>> import sys, csv, codecs
>>> f = codecs.open('unicsv.csv','wb','utf-8')
>>> w = csv.writer(f)
>>> w.writerow([u'lang', u'espa\xa4ol'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa4' in
position 4: ordinal not in range(128)
>>>

2) with custom encoding :

kumar$ python2.5 -S
Python 2.5 (r25:51918, Sep 19 2006, 08:49:13)
[GCC 4.0.1 (Apple Computer, Inc. build 5341)] on darwin
>>> import sys, csv, codecs
>>> sys.setdefaultencoding('utf-8')
>>> f = codecs.open('unicsv.csv','wb','utf-8')
>>> w = csv.writer(f)
>>> w.writerow([u'lang', u'espa\xa4ol'])
>>> f.close()

thanks, Kumar

[1] http://mail.python.org/pipermail/python-dev/2007-June/073593.html
[2] http://diveintopython.org/xml_processing/unicode.html
[3] http://docs.python.org/lib/csv-examples.html#csv-examples


More information about the Python-Dev mailing list