Python 3.1.1 bytes decode with replace bug
Joe
JoeSalmeri at hotmail.com
Sat Oct 24 13:09:31 EDT 2009
The Python 3.1.1 documentation has the following example:
>>> b'\x80abc'.decode("utf-8", "strict")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
unexpected code byte
>>> b'\x80abc'.decode("utf-8", "replace")
'\ufffdabc'
>>> b'\x80abc'.decode("utf-8", "ignore")
'abc'
Strict and Ignore appear to work as per the documentation but replace
does not. Instead of replacing the values it fails:
>>> b'\x80abc'.decode('utf-8', 'replace')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "p:\SW64\Python.3.1.1\lib\encodings\cp437.py", line 19, in
encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in
position
1: character maps to <undefined>
If this a known bug with 3.1.1?
More information about the Python-list
mailing list