Python 3.1.1 bytes decode with replace bug
JoeSalmeri at hotmail.com
Sun Oct 25 02:47:41 CEST 2009
> For the reason BK explained, the important difference is that I ran in
> the IDLE shell, which handles screen printing of unicode better ;-)
Something still does not seem right here to me.
In the example above the bytes were decoded to 'UTF-8' with the
replace option so any characters that were not UTF-8 were replaced and
the resulting string is '\ufffdabc' as BK explained. I understand
that the replace worked.
Now consider this:
Python 3.1.1 (r311:74483, Aug 17 2009, 16:45:59) [MSC v.1500 64 bit
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '\ufffdabc'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in
0: character maps to <undefined>
>>> import sys
This too fails for the exact same reason (and doesn't invole decode).
In the original example I decoded to UTF-8 and in this example the
default encoding is UTF-8 so why is cp437 being used?
Thanks in advance for your assistance!
More information about the Python-list