[Python-ideas] changing sys.stdout encoding

Nick Coghlan ncoghlan at gmail.com
Fri Jun 8 03:01:26 CEST 2012

On Fri, Jun 8, 2012 at 10:14 AM, Rurpy <rurpy at yahoo.com> wrote:
> On 06/07/2012 03:45 PM, Nick Coghlan wrote:
>> If user level code doesn't want those streams, it needs to
>> replace them with something else.
> Yes, this is what the code I googled up does:
>  import codecs
>  sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)
> But that code is not obvious to someone who has been able to do
> all his encoded IO (with the exception of sys.stdout) using just
> the encoding parameter of open().  Hence my question if some-
> thing like a set_encoding() method/function that would work on
> sys.stdout is feasible.  I don't see an answer to that in your
> statement above.

Right, I was only trying to explain why the standard streams are a
special case - because they're also used by the interpreter, and it
makes the startup process much simpler if the interpreter retains
complete control over the way they're initialised (it's already
complicated by the fact we need to get something half-usable in place
as sys.stderr so that error reporting is possible while initialising
them properly). It then becomes an application level operation to
replace them if desired.

We can (and do) make the internal standard stream initialisation
configurable, but it then becomes a UI design problem to get something
that balances flexibility against complexity. PYTHONIOENCODING (in
association with OS utilities that make it possible to set an
environment variable for a specific process invocation, as well as
support in the subprocess module for passing a tailored environment to
subprocesses) is our current solution.

The interpreter design aims, first and foremost, to provide a simple
and straightforward experience in POSIX environments that use UTF-8
everywhere (since that's the most sane approach available for
migrating from a previously ASCII-based computing world). Windows is a
bit trickier (due to the internal use of UTF-16 APIs and the lack of
POSIX-style support for temporarily setting an environment variable
when invoking a process from the shell), but correctly supporting that
environment is also a very high priority. The fallback behaviours when
these situations do not apply are designed to work best on systems
that are, at least somewhat *locally* consistent.

The real world is complex. Eventually, our answer has to be "handle it
at the application level, there are too many variations for us to
support it directly at the interpreter level". Currently, any standard
stream encoding related problem that can't be handled with
PYTHONIOENCODING is just such a situation. We know it sucks for
multi-encoding environments, but those are a nightmare for a lot of
reasons and are the main drivers behind the industry-wide effort to
standardise on Unicode text handling, including universal encodings
like UTF-8.

So now we're down to the question of how much complexity we're willing
to tolerate in the interpreter specifically for the sake of
environments where:
1. The automatic standard stream encoding calculation gives the wrong answer
2. The PYTHONIOENCODING override is insufficient
3. The application being executed isn't already handling the problem
4. A -m executable helper module (or directly executable helper
script) can't be used to initialise the standard streams correctly
before continuing on to execute the requested application via the
runpy module

And the answer is "not much". About the only likely way forward I can
see for streamlining this situation would be to treat this as another
use case for http://bugs.python.org/issue14803, which proposes the
ability to run snippets of Python code prior to execution of __main__.

I do agree that "create a new IO object that is like this old IO
object but with these settings changed" could probably do with a
better official API, but such an API needs to be designed with a
respect for the issues associated with changing encodings "on the fly"
and ask serious questions about whether or not we should be
encouraging that practice by making it easier than it is already. I
thought I had posted a tracker issue to that effect, but I can't find
it now.


Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia

More information about the Python-ideas mailing list