[Python-ideas] changing sys.stdout encoding

Rurpy rurpy at yahoo.com
Thu Jun 7 22:48:24 CEST 2012


On 06/07/2012 01:12 AM, Stephen J. Turnbull wrote:
> Rurpy writes:
> 
>  > I took the time to post here because it took an inordinate
>  > amount of effort to find a solution to a legitimate need 
>  > (your opinion to the contrary not withstanding)
> 
> I don't think I said the need was illegitimate, if I did I apologize,
> and I certainly don't believe it is (I'm an economist by trade -- de
> gustibus non est disputandum).
> 
> I just don't think it's necessary for Python to try to address the
> problem, because the problem is somebody else's bad design at root.

I don't understand that argument.  The world is full of 
bad design that Python has to address: daylight savings 
time, calendars, floating-point (according to some).  
Good/bad design is not even constant and changes with 
time.  There is still a telnetlib module in stdlib despite
the existence of ssh.  I suspect the vast majority of 
programmers are interested in a language that allows 
them to *effectively* get done what they need to, whether 
they are working of the latest agile TTD REST server, or 
modifying some legacy text files.  What I for one *don't*
need is to have my programming language enforcing its 
idea of CS political correctness on me.

Secondly, the disparity in ease of use of an alternate
encoding on sts.stdout is not really between utf8
and non-utf8, it is between a default encoding (which
may be non-utf8), and the encoding I wish to use.  So
one can't really attribute it to a desire to improve 
the world by making non-utf8 harder to use!

And even were I to accept your argument, Python is 
inconsistent: when I open a file explicitly there is 
only a slight penalty for opening a non-default-encoded 
file (the need the explicitly give an encoding):

  f = open ("myfile", "w")   # my default utf8 encoding
  print ("text string", file=f)
vs
  f = open ("myfile", "w", encoding="sjis")  # non-utf8
  print ("text string", file=f)

But for sys.stdout, the penalty for using an alternate
encoding is to google around for a solution (which may 
not be optimal as Victor Stinner pointed out) and then 
read about codecs and the StreamWriter wrapper, textio 
wrappers and the .buffer() method.  And the reading part 
is then repeated by all those (at the same level of python 
expertise) who read the program.

All I can do is repeat what I said before: non-utf8
codings exist and are widely used.  That's a simple
fact.  Sample some .jp web sites and look at the ratio
of shift-jis web pages to utf-8 web pages for example.

utf-8 is an encoding.  shift-jis is an encoding.  Sure,
I understand that utf-8 is preferable and I will use it
when possible.  The fact that I am writing shift-jis means
that utf-8 *isn't* possible in this case.

Since utf-8 and shift-jis are both encodings and are equivalent 
from a coding viewpoint (a simple choice of which codec to use) 
the discrepancy in ease of use between the two in the case of 
writing to the standard streams is not justifiable and should 
be corrected if possible. 

> And I don't think it would be wise to try to do it in a very general
> way, because it's very hard to do that at the general level of the
> language.

But is it?  Or are you referring to switching encoding
on-the-fly?  (see below).

>  > I understand there is no support here for providing a non-
>  > obscure, programmatic way of changing the encoding of the 
>  > standard streams at program startup 
> 
> You're wrong.  There is *some* support for that.
> 
> It just has to be done safely, and that means that a generic
> .set_encoding() method that can be called after I/O has been performed
> probably isn't going to happen.

There are two sub-threads in this discussion

 1) Providing a more convenient and discoverable way to
 programmatically change the encoding of std* streams
 before first use.

 2) Changing the encoding used on the std* stream or
 any textio stream on the fly as a generalization of (1).

I thought I made clear I was advocating for (1) and 
not (2) when I earlier wrote in reply to you:
  > You are correct that my current concern is reinitializing 
  > the encoding(s) of the sys.std* streams prior to doing any
  > operations with them.
and to MRAB:
  > Disclaimer: As I said before, I am not particularly 
  > advocating for a for a set_encoding() method -- my 
  > primary suggestion is a programatic way to change the
  > sys.std* encodings prior to first use. 

As for (2), you have pointed out some potential issues with
switching encodings midstream.  I don't understand how codecs 
work in Python sufficiently yet to either agree or disagree 
with you.  I have however questioned some of the statements 
made regarding its difficulty (and am holding my opinion 
open until I understand the issues better), but I am not 
(as I've stated) advocating for it now.

Sorry if I failed to make the distinction clearer.  My use
of .set_encoding() as a placeholder for both ideas probably 
contributed to the confusion.

> And it might not happen at the core level, since a 3-line function can
> do the job, it might make just as much sense to put up a package on
> PyPI.

I wasn't suggesting a change to the core level (if by that 
you mean to the interpreter).  I was asking if some way could 
be provided that is easier and more reliable than googling 
around for a magic incantation) to change the encoding of one 
or more of the already-open-when-my-program-starts sys.std* 
streams.  I presume that would be a standard library change
(in either the io or sys modules) and offered a .set_encoding() 
method as a placeholder for discussion.

I hardly think it is worth the effort, for either the producer 
or consumers, of putting a 3-line function on PyPI.  Nor would 
such a solution address the discoverability and ease-of-use 
problems I am complaining about.

An inferior and bare minimum way to address this would be to 
at least add a note about how to change the encoding to the 
sys.std* documentation.  That encourages cargo-cult programming 
and doesn't address the WTF effect but it is at least better 
than the current state of affairs.




More information about the Python-ideas mailing list