[issue14200] Idle shell crash on printing non-BMP unicode character

Andrew Svetlov report at bugs.python.org
Thu Mar 15 05:20:02 CET 2012

Andrew Svetlov <andrew.svetlov at gmail.com> added the comment:

I consulted with Martin at PyCon sprint and he suggested sulution which I'm following — to split `print` and REPL (read-eval-print loop).

Output passed to print() function encoded with sys.stdout.encoding

UTF has been invented to support any character.
Linux usually setted up to use utf-8 encoding by default (see LANG environment variable). There are no encoding issues with that.

xterm (old enough terminal) which you use cannot print non-BMP characters and replaces it with question marks.
Modern gnome-terminal prints that symbols very well.

Let's return to non-UTF terminal encodings.
If character cannot be encoded Python throws UnicodeEncodeError.
There's example:

andrew at tiktaalik ~/p/cpython> bash -c "LANG=C; ./python"
Python 3.3.0a1+ (qbase qtip tip tk:c3ce8a8e6c9c+, Mar 14 2012, 15:54:55) 
[GCC 4.6.1] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> '\U00010340'
>>> print('\U00010340')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\U00010340' in position 0: ordinal not in range(128)

As you can see I have switched LANG to C (alias for ASCII) locale.

Eval printed with unicode escaping but `print` call raises error.
This happens because python's REPL calls sys.displayhook.
You can look at http://docs.python.org/dev/library/sys.html#sys.displayhook details. 
That code escapes unicode if terminal doesn't support it.

The same for Windows, OS X and any other platform.


Python tracker <report at bugs.python.org>

More information about the Python-bugs-list mailing list