[Python-Dev] Python 3.5 now uses surrogateescape for the POSIX locale

Victor Stinner victor.stinner at gmail.com
Tue Mar 18 11:13:29 CET 2014


2014-03-18 10:48 GMT+01:00 Nick Coghlan <ncoghlan at gmail.com>:
> Well, the concern has always been the risk of silently generating bad
> data if there is a mismatch between the OS encoding and the stream
> encodings.

Data can be loaded from OS functions, from files and from stdin. These
3 sources may use various different and incompatible encodings.
surrogateescape is used by OS functions, and now also by stdin when
the POSIX locale is used.

When the POSIX locale is used, OS functions and stdin can use
different encodings if the PYTHONIOENCODING environment variable is
used. Since we are consentent adults, I guess that you understand what
you are doing when you set PYTHONIOENCODING.

On Windows, the encoding of standard streams is the OEM code page, or
the ANSI code page if a stream is redirected, it's unrelated to the
LC_CTYPE locale. So surrogateecape can be used when if the encoding of
standard streams is not ASCII.

We may handle Windows differently to use strict even if the LC_CTYPE
locale is "C".

Note: On FreeBSD, Solaris and OpenIndiana, nl_langinfo(CODESET)
announces an alias of the ASCII encoding when the LC_CTYPE locale is
POSIX, whereas mbstowcs() and wcstombs() functions use the ISO-8859-1
encoding. Python 3 now uses the ASCII encoding for its "filesystem"
(OS) encoding.

Victor


More information about the Python-Dev mailing list