[New-bugs-announce] [issue19977] Use "surrogateescape" error handler for sys.stdout on UNIX for the C locale

STINNER Victor report at bugs.python.org
Fri Dec 13 17:40:20 CET 2013


New submission from STINNER Victor:

When LANG=C is used to get the english language (which is a mistake, LC_CTYPE=C should be used instead) or when Python is started with an empty environment (no environment variable), Python gets the POSIX locale (aka "C locale") for the LC_CTYPE (encoding) locale.

Standard streams use the locale encoding, which is usually ASCII with POSIX locale on most platforms (except on AIX: ISO 8859-1). In this case, data read from the OS (environment variables, command line arguments, filenames, etc.) may contain surrogate characters because of the internal usage of the surrogateescape error handler (see the PEP 383 for the rationale).

The problem is that standard output uses the strict error handler, and so print() fails to display OS data like filenames.

Example, "ls" command in Python:
---
import os
for name in sorted(os.listdir()): print(name)
---

Try it with "LANG=C python ls.py" in a directory containing non-ASCII characters and you will get unicode errors.

Issues #19846 and #19847 are examples of this annoyance.

I propose to use also the surrogateescape error handler for sys.stdout if the POSIX locale is used for LC_CTYPE at startup. Attached patch implements this idea.

With the patch, "LANG=C python ls.py" almost works as filenames and stdout are byte streams, even if the Unicode type is used.

----------
components: Unicode
files: c_locale_surrogateescape.patch
keywords: patch
messages: 206111
nosy: Sworddragon, a.badger, ezio.melotti, haypo, loewis, ncoghlan
priority: normal
severity: normal
status: open
title: Use "surrogateescape" error handler for sys.stdout on UNIX for the C locale
versions: Python 3.4
Added file: http://bugs.python.org/file33122/c_locale_surrogateescape.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue19977>
_______________________________________


More information about the New-bugs-announce mailing list