Python 3.1.1 bytes decode with replace bug

Joe JoeSalmeri at hotmail.com
Sat Oct 24 19:09:31 CEST 2009


The Python 3.1.1 documentation has the following example:

>>> b'\x80abc'.decode("utf-8", "strict")
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
                    unexpected code byte
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'

Strict and Ignore appear to work as per the documentation but replace
does not.  Instead of replacing the values it fails:

>>> b'\x80abc'.decode('utf-8', 'replace')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in
encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in
position
1: character maps to <undefined>

If this a known bug with 3.1.1?





More information about the Python-list mailing list