Python 3.1.1 bytes decode with replace bug

Benjamin Kaplan benjamin.kaplan at case.edu
Sat Oct 24 16:43:39 EDT 2009


On Sat, Oct 24, 2009 at 1:09 PM, Joe <JoeSalmeri at hotmail.com> wrote:
> The Python 3.1.1 documentation has the following example:
>
>>>> b'\x80abc'.decode("utf-8", "strict")
> Traceback (most recent call last):
>  File "<stdin>", line 1, in ?
> UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
>                    unexpected code byte
>>>> b'\x80abc'.decode("utf-8", "replace")
> '\ufffdabc'
>>>> b'\x80abc'.decode("utf-8", "ignore")
> 'abc'
>
> Strict and Ignore appear to work as per the documentation but replace
> does not.  Instead of replacing the values it fails:
>
>>>> b'\x80abc'.decode('utf-8', 'replace')
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in
> encode
>    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
> UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in
> position
> 1: character maps to <undefined>
>
> If this a known bug with 3.1.1?
>

It's not a bug. The problem isn't even the decode statement. Python
successfully creates the unicode string '\ufffdabc' and then tries to
print it to the screen. so it has to convert it to cp437 (your console
encoding) which fails. That's why the traceback mentions the cp437
file and not the utf-8 file.

>
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list