[Tutor] unicode encoding hell

Kent Johnson kent37 at tds.net
Thu Sep 6 12:45:08 CEST 2007


David Bear wrote:

> feedp.entry.title.decode('utf-8', 'xmlcharrefreplace')
> 
> I assume it would take any unicode character and 'do the right thing',
> including replacing higher ordinal chars with xml entity refs. But I still
> get
> 
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in
> position 31: ordinal not in range(128)
> 
> Clearly, I completely do not understand how unicode is working here. Can
> anyone enlighten me?

It sounds like you already have Unicode. Notice that you are trying to 
decode but the error is for encoding.

In [17]: u'\u2019'.decode('utf-8')
------------------------------------------------------------
Traceback (most recent call last):
   File "<ipython console>", line 1, in <module>
   File 
"/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/encodings/utf_8.py", 
line 16, in decode
     return codecs.utf_8_decode(input, errors, True)
<type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode 
character u'\u2019' in position 0: ordinal not in range(128)

decode() goes towards unicode, encode() goes away from unicode.

Kent


More information about the Tutor mailing list