UnicodeEncodeError in compile
Terry Reedy
tjreedy at udel.edu
Tue Jan 10 19:56:39 EST 2012
On 1/10/2012 8:43 AM, jmfauth wrote:
> D:\>c:\python32\python.exe
> Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit
> (Intel)] on win
> 32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> '\u5de5'.encode('utf-8')
> b'\xe5\xb7\xa5'
>>>> '\u5de5'.encode('mbcs')
> Traceback (most recent call last):
> File "<stdin>", line 1, in<module>
> UnicodeEncodeError: 'mbcs' codec can't encode characters in position
> 0--1: inval
> id character
> D:\>c:\python27\python.exe
> Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit
> (Intel)] on win
> 32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> u'\u5de5'.encode('utf-8')
> '\xe5\xb7\xa5'
>>>> u'\u5de5'.encode('mbcs')
> '?'
mbcs encodes according to the current codepage. Only the chinese
codepage(s) can encode the chinese char. So the unicode error is correct
and 2.7 has a bug in that it is doing "errors='replace'" when it
supposedly is doing "errors='strict'". The Py3 fix was done in
http://bugs.python.org/issue850997
2.7 was intentionally left alone because of back-compatibility
considerations. (None of this addresses the OP's question.)
--
Terry Jan Reedy
More information about the Python-list
mailing list