[issue30019] IDLE freezes when opening a file with astral characters

Eryk Sun report at bugs.python.org
Sat Apr 8 00:23:47 EDT 2017


Eryk Sun added the comment:

In Windows IDLE 3.x, you should still be able to print a surrogate transcoding, which sneaks the native UTF-16LE encoding around tkinter:

    def transurrogate(s):
        b = s.encode('utf-16le')
        return ''.join(b[i:i+2].decode('utf-16le', 'surrogatepass') 
                       for i in range(0, len(b), 2))

    def print_surrogate(*args, **kwds):
        new_args = []
        for arg in args:
            if isinstance(arg, str):
                new_args.append(transurrogate(s))
            else:
                new_args.append(arg)
        return print(*new_args, **kwds)


    >>> s = '\U0001f52b \U0001f52a'
    >>> print_surrogate(s)
    🔫 🔪

Pasting non-BMP text into IDLE fails on Windows for a similar reason. Tk naively encodes the surrogate codes in the native Windows UTF-16 text as invalid UTF-8, which I've seen refereed to as WTF-8 (Wobbly). I see the following error when I run IDLE using python.exe (i.e. with a console) and paste "🔫 🔪" into the window:

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 1: invalid continuation byte

This is the second byte of the WTF-8 encoding:

    >>> transurrogate('"\U0001f52b').encode('utf-8', 'surrogatepass')
    b'"\xed\xa0\xbd\xed\xb4\xab'

Hackiness aside, I don't think it's worth supporting this just for Windows.

----------
nosy: +eryksun

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30019>
_______________________________________


More information about the Python-bugs-list mailing list