
2012/6/5 Rurpy <rurpy@yahoo.com>:
In my first foray into Python3 I've encountered this problem: I work in a multi-language environment. I've written a number of tools, mostly command-line, that generate output on stdout. Because these tools and their output are used by various people in varying environments, the tools all have an --encoding option to provide output that meets the needs and preferences of the output's ultimate consumers.
What happens if the specified encoding is different than the encoding of the console? Mojibake? If the output is used as in the input of another program, does the other program use the same encoding? In my experience, using an encoding different than the locale encoding for input/output (stdout, environment variables, command line arguments, etc.) causes various issues. So I'm curious of your use cases.
In converting them to Python3, I found the best (if not very pleasant) way to do this in Python3 was to put something like this near the top of each tool[*1]:
import codecs sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)
In Python 3, you should use io.TextIOWrapper instead of codecs.StreamWriter. It's more efficient and has less bugs.
What I want to be able to put there instead is:
sys.stdout.set_encoding (opts.encoding)
I don't think that your use case merit a new method on io.TextIOWrapper: replacing sys.stdout does work and should be used instead. TextIOWrapper is generic and your use case if specific to sys.std* streams. It would be surprising to change the encoding of an arbitrary file after it is opened. At least, I don't see the use case. For example, tokenize.open() opens a Python source code file with the right encoding. It starts by reading the file in binary mode to detect the encoding, and then use TextIOWrapper to get a text file without having to reopen the file. It would be possible to start with a text file and then change the encoding, but it would be less elegant.
sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)
You should also flush sys.stdout (and maybe also sys.stdout.buffer) before replacing it.
It requires the import of the codecs module in programs that other- wise don't need it [*2], and the reading of the codecs docs (not a shining example of clarity themselves) to understand it.
It's maybe difficult to change the encoding of sys.stdout at runtime because it is NOT a good idea :-)
Needing to change the encoding of a sys.std* stream is not an uncommon need and a user should not have to go through the codecs dance above to do so IMO.
Replacing sys.std* works but has issues: output written before the replacement is encoded to a different encoding for example. The best way is to change your locale encoding (using LC_ALL, LC_CTYPE or LANG environment variable on UNIX), or simply to set PYTHONIOENCODING environment variable.
[*1] There are other ways to change stdout's encoding but they all have problems AFAICT. PYTHONIOENCODING can't easily be changed dynamically within program.
Ah? Detect if PYTHONIOENCODING is present (or if sys.stdout.encoding is the requested encoding), if not: restart the program with PYTHONIOENCODING=encoding.
Overloading print() is obscure because it requires reader to notice print was overloaded.
Why not writing the output into a file, instead of stdout? Victor