[New-bugs-announce] [issue18713] Enable surrogateescape on stdin and stdout when appropriate

Nick Coghlan report at bugs.python.org
Mon Aug 12 17:19:34 CEST 2013


New submission from Nick Coghlan:

One problem with Unicode in 3.x is that surrogateescape isn't normally enabled on stdin and stdout. This means the following code will fail with UnicodeEncodeError in the presence of invalid filesystem metadata:

    print(os.listdir())

We don't really want to enable surrogateescape on sys.stdin or sys.stdout unilaterally, as it increases the chance of data corruption errors when the filesystem encoding and the IO encodings don't match.

Last night, Toshio and I thought of a possible solution: enable surrogateescape by default for sys.stdin and sys.stdout on non-Windows systems if (and only if) they're using the same codec as that returned by sys.getfilesystemencoding() (allowing for codec aliases rather than doing a simple string comparison)

This means that for full UTF-8 systems (which includes most modern Linux installations), roundtripping will be enabled by default between the standard streams and OS facing APIs, while systems where the encodings don't match will still fail noisily.

A more general alternative is also possible: default to errors='surrogatescape' for *any* text stream that uses the filesystem encoding. It's primarily the standard streams we're interested in fixing, though.

----------
messages: 194968
nosy: abadger1999, benjamin.peterson, ezio.melotti, haypo, lemburg, ncoghlan, pitrou
priority: normal
severity: normal
stage: needs patch
status: open
title: Enable surrogateescape on stdin and stdout when appropriate
type: enhancement
versions: Python 3.4

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue18713>
_______________________________________


More information about the New-bugs-announce mailing list