[issue8092] utf8, backslashreplace and surrogates

STINNER Victor report at bugs.python.org
Tue Apr 20 23:59:52 CEST 2010

STINNER Victor <victor.stinner at haypocalc.com> added the comment:

Oops, I forgot the remove the reallocation in the unicode case in the patch version 2.

Patch version 3:
 - micro-optimization: group both surrogates cases in the same if to avoid checking 0xD800 <= ch twice
 - check for integer overflow
 - (remove the duplication reallocation introduced by version 2)

I think that PyUnicode_EncodeUTF8() is more readable after my patch: there maximum if depth is 2 instead of 3, and I removed the goto.

It shouldn't change anything about performances for chacters < 0x800 (ASCII and Latin-1), and I expect similar performances for characters >= 0x800.

Added file: http://bugs.python.org/file17013/utf8_surrogate_error-3.patch

Python tracker <report at bugs.python.org>

More information about the Python-bugs-list mailing list