[issue30019] IDLE freezes when opening a file with astral characters
Eryk Sun
report at bugs.python.org
Sat Apr 8 00:23:47 EDT 2017
Eryk Sun added the comment:
In Windows IDLE 3.x, you should still be able to print a surrogate transcoding, which sneaks the native UTF-16LE encoding around tkinter:
def transurrogate(s):
b = s.encode('utf-16le')
return ''.join(b[i:i+2].decode('utf-16le', 'surrogatepass')
for i in range(0, len(b), 2))
def print_surrogate(*args, **kwds):
new_args = []
for arg in args:
if isinstance(arg, str):
new_args.append(transurrogate(s))
else:
new_args.append(arg)
return print(*new_args, **kwds)
>>> s = '\U0001f52b \U0001f52a'
>>> print_surrogate(s)
🔫 🔪
Pasting non-BMP text into IDLE fails on Windows for a similar reason. Tk naively encodes the surrogate codes in the native Windows UTF-16 text as invalid UTF-8, which I've seen refereed to as WTF-8 (Wobbly). I see the following error when I run IDLE using python.exe (i.e. with a console) and paste "🔫 🔪" into the window:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xed in position 1: invalid continuation byte
This is the second byte of the WTF-8 encoding:
>>> transurrogate('"\U0001f52b').encode('utf-8', 'surrogatepass')
b'"\xed\xa0\xbd\xed\xb4\xab'
Hackiness aside, I don't think it's worth supporting this just for Windows.
----------
nosy: +eryksun
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue30019>
_______________________________________
More information about the Python-bugs-list
mailing list