[issue8092] utf8, backslashreplace and surrogates
report at bugs.python.org
Tue Apr 20 23:59:52 CEST 2010
STINNER Victor <victor.stinner at haypocalc.com> added the comment:
Oops, I forgot the remove the reallocation in the unicode case in the patch version 2.
Patch version 3:
- micro-optimization: group both surrogates cases in the same if to avoid checking 0xD800 <= ch twice
- check for integer overflow
- (remove the duplication reallocation introduced by version 2)
I think that PyUnicode_EncodeUTF8() is more readable after my patch: there maximum if depth is 2 instead of 3, and I removed the goto.
It shouldn't change anything about performances for chacters < 0x800 (ASCII and Latin-1), and I expect similar performances for characters >= 0x800.
Added file: http://bugs.python.org/file17013/utf8_surrogate_error-3.patch
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list