[Python-ideas] changing sys.stdout encoding
rurpy at yahoo.com
Tue Jun 5 19:20:01 CEST 2012
In my first foray into Python3 I've encountered this problem:
I work in a multi-language environment. I've written a number
of tools, mostly command-line, that generate output on stdout.
Because these tools and their output are used by various people
in varying environments, the tools all have an --encoding option
to provide output that meets the needs and preferences of the
output's ultimate consumers.
In converting them to Python3, I found the best (if not very
pleasant) way to do this in Python3 was to put something like
this near the top of each tool[*1]:
sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)
What I want to be able to put there instead is:
The former I found on the internet -- there is zero probability
I could have figured that out from the Python docs. It is obscure
to anyone (who has like me generally only needed to deal with
.encode() and .decode()) who hasn't encountered it before or
dealt much with the codecs module. It is excessively complex
for what is conceptually a simple and straight-forward operation.
It requires the import of the codecs module in programs that other-
wise don't need it [*2], and the reading of the codecs docs (not
a shining example of clarity themselves) to understand it. In
short it is butt ugly relative to what I generally get in Python.
Would it be feasible to provide something like .set_encoding()
on textio streams? (Or make .encoding a writeable property?; it
seems to intentionally be non-writeable for some reason but is that
reason really unavoidable?) If doing this for textio in general is
too hard, then what about encapsulating the codecs stuff above in
a sys.set_encoding() function?
Needing to change the encoding of a sys.std* stream is not an
uncommon need and a user should not have to go through the
codecs dance above to do so IMO.
[*1] There are other ways to change stdout's encoding but they
all have problems AFAICT. PYTHONIOENCODING can't easily be
changed dynamically within program. Reopening stdout as binary,
or using the binary interface to text stdout, requires a explicit
encode call at each write site. Overloading print() is obscure
because it requires reader to notice print was overloaded.
[*2] I don't mean the actual import of the codecs module which
occurs anyway; I mean the extra visual and cognitive noise
introduced by the presence of the import statement in the source.
More information about the Python-ideas