[Python-ideas] changing sys.stdout encoding
MRAB
python at mrabarnett.plus.com
Wed Jun 6 12:09:06 EDT 2012
On 06/06/2012 08:09, Rurpy wrote:
> On 06/05/2012 05:56 PM, MRAB wrote:
>> On 06/06/2012 00:34, Victor Stinner wrote:
>>> 2012/6/5 Rurpy<rurpy-/E1597aS9LQAvxtiuMwx3w at public.gmane.org>:
>>>> In my first foray into Python3 I've encountered this problem:
>>>> I work in a multi-language environment. I've written a number
>>>> of tools, mostly command-line, that generate output on stdout.
>>>> Because these tools and their output are used by various people
>>>> in varying environments, the tools all have an --encoding option
>>>> to provide output that meets the needs and preferences of the
>>>> output's ultimate consumers.
>>>
>>> What happens if the specified encoding is different than the encoding
>>> of the console? Mojibake?
>>>
>>> If the output is used as in the input of another program, does the
>>> other program use the same encoding?
>>>
>>> In my experience, using an encoding different than the locale encoding
>>> for input/output (stdout, environment variables, command line
>>> arguments, etc.) causes various issues. So I'm curious of your use
>>> cases.
>>>
>>>> In converting them to Python3, I found the best (if not very
>>>> pleasant) way to do this in Python3 was to put something like
>>>> this near the top of each tool[*1]:
>>>>
>>>> import codecs
>>>> sys.stdout = codecs.getwriter(opts.encoding)(sys.stdout.buffer)
>>>
>>> In Python 3, you should use io.TextIOWrapper instead of
>>> codecs.StreamWriter. It's more efficient and has less bugs.
>>>
>>>> What I want to be able to put there instead is:
>>>>
>>>> sys.stdout.set_encoding (opts.encoding)
>>>
>>> I don't think that your use case merit a new method on
>>> io.TextIOWrapper: replacing sys.stdout does work and should be used
>>> instead. TextIOWrapper is generic and your use case if specific to
>>> sys.std* streams.
>>>
>>> It would be surprising to change the encoding of an arbitrary file
>>> after it is opened. At least, I don't see the use case.
>>>
>> [snip]
>>
>> And if you _do_ want multiple encodings in a file, it's clearer to open
>> the file as binary and then explicitly encode to bytes and write _that_
>> to the file.
>
> But is it really?
>
> The following is very simple and the level of python
> expertise required is minimal. It (would) works fine
> with redirection. One could substitute any other ordinary
> open (for write) text file for sys.stdout.
>
> [off the top of my head]
> text = 'This is %s text: 世界へ、こんにちは!'
> sys.stdout.set_encoding ('sjis')
> print (text % 'sjis')
> sys.stdout.set_encoding ('euc-jp')
> print (text % 'euc-jp')
> sys.stdout.set_encoding ('iso2022-jp')
> print (text % 'iso2022-jp')
>
> As for your suggestion, how do I reopen sys.stdout in
> binary mode? I don't need to do that often and don't
> know off the top of my head. (And it's too late for
> me to look it up.) And what happens to redirected output
> when I close and reopen the stream? I can open a regular
> filename instead. But remember to make the last two
> opens with "a" rather than "w". And don't forget the
> "\n" at the end of the text line.
>
> Could you show me an code example of your suggestion
> for comparison?
>
> Disclaimer: As I said before, I am not particularly
> advocating for a for a set_encoding() method -- my
> primary suggestion is a programatic way to change the
> sys.std* encodings prior to first use. Here I am just
> questioning the claim that a set_encoding() method
> would not be clearer than existing alternatives.
>
This example accesses the underlying binary output stream:
# -*- coding: utf-8 -*-
import sys
class Writer:
def __init__(self, output):
self.output = output
self.encoding = output.encoding
def write(self, string):
self.output.buffer.write(string.encode(self.encoding))
def set_encoding(self, encoding):
self.output.buffer.flush()
self.encoding = encoding
sys.stdout = Writer(sys.stdout)
initial_encoding = sys.stdout.encoding
text = 'This is %s text: 世界へ、こんにちは!'
sys.stdout.set_encoding('utf-8')
print (text % 'utf-8')
sys.stdout.set_encoding('sjis')
print (text % 'sjis')
sys.stdout.set_encoding('euc-jp')
print (text % 'euc-jp')
sys.stdout.set_encoding('iso2022-jp')
print (text % 'iso2022-jp')
sys.stdout.set_encoding(initial_encoding)
More information about the Python-list
mailing list