[issue24214] Exception with utf-8, surrogatepass and incremental decoding
STINNER Victor
report at bugs.python.org
Wed Jul 27 12:33:22 EDT 2016
STINNER Victor added the comment:
Attached patch fixes the UTF-8 decoder to support correctly incremental decoder using surrogatepass error handler.
The bug occurs when b'\xed\xa4\x80' is decoded in two parts: the first two bytes (b'\xed\xa4'), and then the last byte (b'\x80').
It works as expected if we decode the first byte (b'\xed') and then the two last bytes (b'\xa4\x80').
My patch tries to keep best performances for the UTF-8/strict decoder.
@Serhiy: Would you mind to review my patch since you helped to design the fast UTF-8 decoder?
----------
keywords: +patch
Added file: http://bugs.python.org/file43911/surrogatepass.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue24214>
_______________________________________
More information about the Python-bugs-list
mailing list