[Python-ideas] changing sys.stdout encoding

Guido van Rossum guido at python.org
Wed Jun 13 07:21:45 CEST 2012

On Tue, Jun 12, 2012 at 9:58 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Oscar Benjamin writes:
>  > I also think I missed something in this thread. At the beginning of the
>  > original thread it seemed that everyone was agreed that
>  >
>  >   writer = codecs.getwriter(desired_encoding)
>  >   sys.stdout = writer(sys.stdout.buffer)
>  >
>  > was a reasonable solution (with the caveat that it should happen before any
>  > output is written). Is there some reason why this is not a good
>  > approach?
> It's undocumented and unobvious, but it's needed for standard stream
> filtering in some environments -- where a lot of coding is done by
> people who otherwise never need to understand streams at anything but
> a superficial level -- and the analogous case of a newly opened file,
> pipe, or socket is documented and obvious, and usable by novices.
> It's damn shame that we can't say the same about the stdin, stdout,
> and stderr streams (even if I too have been at pains to explain why
> that's hard to fix).

I'm probably missing something, but in all my naivete I have what
feels like a simple solution, and I can't seem to see what's wrong
with it.

In C there used to be a function to set the buffer size on an open
stream that could only be called when the stream hadn't been used yet.
ISTM the OP's use case would be covered by a similar function on an
open TextIOWrapper to set the encoding that can only be used when it
hasn't been used to write (or read) anything yet? When called under
any other circumstances it should raise an error. The TextIOWrapper
should maintain a "used" flag so that it can raise this exception

This ought to work for stdin and stdout when used at the start of the
program, assuming nothing is written by code run before main starts.
(This should normally be fine, otherwise you couldn't use a Python
program as a filter at all.) It won't work for stderr if connected to
a tty-ish device (since the version stuff is written there) but that
should be okay, and it should still be okay with stderr if it's not a
tty, since then it starts silent. (But I don't think the use case is
very strong for stderr anyway.)

I'm not sure about a name, but it might well be called set_encoding().
The error message when misused should clarify to people who
misunderstand the name that it can only be called when the stream
hasn't been used yet; I don't think it's necessary to encode that
information in the name. (C's setbuf() wasn't called
set_buffer_on_virgin_stream() either. :-)

I don't care about the integrity of the underlying binary stream. It's
a binary stream, you can write whatever bytes you want to it. But if a
TextIOWrapper is used properly, it won't write a mixture of encodings
to the underlying binary stream, since you can only set the encoding
before reading/writing a single byte. (And the TextIOWrapper is
careful not to use the binary stream before the first actual read() or
write() call -- it just tries to calls tell(), if it's seekable, which
should be safe.)

--Guido van Rossum (python.org/~guido)

More information about the Python-ideas mailing list