[issue14200] Idle shell crash on printing non-BMP unicode character
report at bugs.python.org
Thu Mar 15 05:20:02 CET 2012
Andrew Svetlov <andrew.svetlov at gmail.com> added the comment:
I consulted with Martin at PyCon sprint and he suggested sulution which I'm following — to split `print` and REPL (read-eval-print loop).
Output passed to print() function encoded with sys.stdout.encoding
UTF has been invented to support any character.
Linux usually setted up to use utf-8 encoding by default (see LANG environment variable). There are no encoding issues with that.
xterm (old enough terminal) which you use cannot print non-BMP characters and replaces it with question marks.
Modern gnome-terminal prints that symbols very well.
Let's return to non-UTF terminal encodings.
If character cannot be encoded Python throws UnicodeEncodeError.
andrew at tiktaalik ~/p/cpython> bash -c "LANG=C; ./python"
Python 3.3.0a1+ (qbase qtip tip tk:c3ce8a8e6c9c+, Mar 14 2012, 15:54:55)
[GCC 4.6.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\U00010340' in position 0: ordinal not in range(128)
As you can see I have switched LANG to C (alias for ASCII) locale.
Eval printed with unicode escaping but `print` call raises error.
This happens because python's REPL calls sys.displayhook.
You can look at http://docs.python.org/dev/library/sys.html#sys.displayhook details.
That code escapes unicode if terminal doesn't support it.
The same for Windows, OS X and any other platform.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list