[issue13153] IDLE crashes when pasting non-BMP unicode char on Py3

Serhiy Storchaka report at bugs.python.org
Sun Jan 5 16:13:57 CET 2014


Serhiy Storchaka added the comment:

Yes, it is still the same issue. The root of issue is in converting strings when passed to Python-implemented callbacks. When a text is pasted in IDLE window, the callback is called (for highlighting). The callback is a command created by Tcl_CreateCommand from PythonCmd. PythonCmd is a wrapper which converts arguments (char*) to Python strings and then pass them to Python command. Arguments are encoded in "modified UTF-8" [1], i.e. the NUL character is represented as \xc0\x80, they can contains other invalid UTF-8 sequences (such as encoded surrogates). When decoding arguments to Python strings are failed, main Tcl loop is broken and IDLE silently closed.

When astral character is pasted on Windows, it first encoded to UTF-16 by Windows, then Tcl encodes every 16-bit surrogate to modified UTF-8. The result is not valid UTF-8. On X Window systems the X selection value usually is UTF-8 encoded (the type is UTF8_STRING), but can contains invalid UTF-8 sequences.

I will open separate issue to fix other bugs related to Tcl <-> Python string conversions. The last patch fixes only initial issue which is most important.

[1] http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13153>
_______________________________________


More information about the Python-bugs-list mailing list