[New-bugs-announce] [issue10601] sys.displayhook: use backslashreplace error handler if repr(value) is not encodable to sys.stdout

STINNER Victor report at bugs.python.org
Thu Dec 2 02:43:27 CET 2010


New submission from STINNER Victor <victor.stinner at haypocalc.com>:

On Windows, the Python interpreter fails to display a result if stdout encoding is unable to encode it. This problem exists since Python 3.0. Eg. see issue #1602.

This problem is not specific to Windows. Even if stdout encoding is UTF-8 (which is the default encoding of Mac OS X and most Linux distributions), it fails on surrogate characters (because the UTF-8 encoder refuses surrogate characters in Python 3). Eg. see issue #5110.

Even if a Python (core? :-)) developer can see this behaviour as expected, it looks like different users (including me) don't like it and would prefer to see the result instead of an unicode exception. The problem is that we don't know directly (except for simple commands) if the error comes from the command or if printing the result failed.

This issue is specific to sys.displayhook, the callback used by the Python interpreter to display the result of a command. It doesn't concern print() or sys.stdout.write().

--

The best solution would be to check if the terminal is able to render a character, but this is not possible for technical reasons. The best that we can do is to catch the UnicodeEncodeError and use another error handler (than sys.stdout.errors) which doesn't fail. 'backslashreplace' is a good candidate.

Ezio Melotti implemented this solution and attached a patch to issue #9198.

I wrote a new version of his patch, changes:

 - Create a subfunction (for better readability)
 - Clear the UnicodeEncodeError before calling sys_displayhook_unencodable() (anyway, the exception will be lost on next error, eg. if PyObject_GetAttrString() fails)
 - Clear the (AttributeError) exception if PyObject_GetAttrString(outf, "buffer") fails
 - Add an unit test: test ASCII, ISO-8859-1 and UTF-8 with non-ASCII, surrogate and non-BMP (printable or not) characters
 - Complete and update sys.displayhook documentation
 - Fix a refleak if stdout_encoding_str == NULL
 - Use PyObject_CallMethod() instead of PyTuple_Pack() + PyEval_CallObject() for a shorter (and more readable) code

--

I don't know how to test the case: sys.stdout.write(repr(value)) fails and sys.stdout has no buffer attribute. A mockup should maybe be used in the unit test?

----------
components: Unicode
files: displayhook_unencodable.patch
keywords: patch
messages: 123031
nosy: belopolsky, ezio.melotti, haypo
priority: normal
severity: normal
status: open
title: sys.displayhook: use backslashreplace error handler if repr(value) is not encodable to sys.stdout
versions: Python 3.2
Added file: http://bugs.python.org/file19897/displayhook_unencodable.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10601>
_______________________________________


More information about the New-bugs-announce mailing list