[Python-3000] encode function errors="replace", but print() failed, is this a bug?

Thu Nov 20 03:27:57 CET 2008

Hi,

Recently I encountered a problem with the str.encode() function.  I used the
function like this: s.encode("mbcs", "replace"), expecting it will eliminate
all invalid characters.  However it failed with the following message:
UnicodeEncodeError: 'gbk' codec can't encode character '\ue104' in position
4: i

Am I using it in a wrong way or is it a bug?

Platform: Windows Vista SP1, system default code page: 936 (zh-cn).  Program
(test.py.txt) in attachment.

>python3 test.py
A
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    print(str.encode("mbcs", "replace").decode("mbcs", "replace"))
  File "C:\Python30\lib\io.py", line 1485, in write
    b = encoder.encode(s)
UnicodeEncodeError: 'gbk' codec can't encode character '\ue104' in position
4: i
llegal multibyte sequence
>python3 test.py
A
??íé“øô{??‰ã°„˜z
B
>python3 test.py
A
’é??????ñqÀÕèŸ
B

Thanks,

Decheng (AKA Robbie Mosaic) Fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20081120/7cef2794/attachment-0001.htm>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: test.py.txt
URL: <http://mail.python.org/pipermail/python-3000/attachments/20081120/7cef2794/attachment-0001.txt>