UnicodeDecodeError quick question

Tim Golden mail at timgolden.me.uk
Thu Dec 4 17:14:42 CET 2008


patrick.waldo at gmail.com wrote:
> Hi Everyone,
> 
> I am using Python 2.4 and I am converting an excel spreadsheet to a
> pipe delimited text file and some of the cells contain utf-8
> characters.  I solved this problem in a very unintuitive way and I
> wanted to ask why.  If I do,
> 
> csvfile.write(cell.encode("utf-8"))
> 
> I get a UnicodeDecodeError.  However if I do,
> 
> c = unicode(cell.encode("utf-8"),"utf-8")
> csvfile.write(c)
> 
> Why should I have to encode the cell to utf-8 and then make it unicode
> in order to write to a text file?  Is there a more intuitive way to
> get around these bothersome unicode errors?


The short answer is that you're writing to a file
you've opened with the codecs module. Any write to
this file expects unicode data and will automatically
encode it to the encoding you specified. You're trying
to send it utf8-encoded data -- ie a string of bytes,
*not* unicode -- and it presumably tries to decode it
to a unicode object before encoding it as utf8 like
you asked it to. Without looking at the implementation,
it probably just does unicode (x) on what you've passed
in, will will use the default ascii codec and fail in
the way you saw.

(Honestly, that was the short answer).

To solve it, assuming cell is already unicode, just pass
it unadulterated to csvfile.write.

The reason the other thing works is because you're in
control of the -- unncessary -- unicode conversion, and
you're telling Python what encoding to use for decoding
and encoding.

TJG



More information about the Python-list mailing list