Python 3.1.1 bytes decode with replace bug

Benjamin Kaplan benjamin.kaplan at case.edu
Sat Oct 24 22:30:18 EDT 2009


On Sat, Oct 24, 2009 at 8:47 PM, Joe <JoeSalmeri at hotmail.com> wrote:
>> For the reason BK explained, the important difference is that I ran in
>> the IDLE shell, which handles screen printing of unicode better ;-)
>
> Something still does not seem right here to me.
>
> In the example above the bytes were decoded to 'UTF-8' with the
> replace option so any characters that were not UTF-8 were replaced and
> the resulting string is '\ufffdabc' as BK explained.  I understand
> that the replace worked.
>
> Now consider this:
>
> Python 3.1.1 (r311:74483, Aug 17 2009, 16:45:59) [MSC v.1500 64 bit
> (AMD64)] on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
>>>> s = '\ufffdabc'
>>>> print(s)
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in
> encode
>    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in
> position
> 0: character maps to <undefined>
>>>> import sys
>>>> sys.getdefaultencoding()
> 'utf-8'
>
> This too fails for the exact same reason (and doesn't invole decode).
>
> In the original example I decoded to UTF-8 and in this example the
> default encoding is UTF-8 so why is cp437 being used?
>
> Thanks in advance for your assistance!
>

Try checking sys.stdout.encoding. Then run the command chcp (not in
the python interpreter). You'll probably get 437 from both of those.
Just because the system encoding is set to utf-8 doesn't mean the
console is. Nobody really uses cp437 anymore- it was replaced years
ago by cp1252- but Microsoft is scared to do anything to cmd.exe
because it might break somebody's 20-year-old DOS script
>
>
>
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list