[Python-ideas] changing sys.stdout encoding
Rurpy
rurpy at yahoo.com
Thu Jun 7 22:48:24 CEST 2012
On 06/07/2012 01:12 AM, Stephen J. Turnbull wrote:
> Rurpy writes:
>
> > I took the time to post here because it took an inordinate
> > amount of effort to find a solution to a legitimate need
> > (your opinion to the contrary not withstanding)
>
> I don't think I said the need was illegitimate, if I did I apologize,
> and I certainly don't believe it is (I'm an economist by trade -- de
> gustibus non est disputandum).
>
> I just don't think it's necessary for Python to try to address the
> problem, because the problem is somebody else's bad design at root.
I don't understand that argument. The world is full of
bad design that Python has to address: daylight savings
time, calendars, floating-point (according to some).
Good/bad design is not even constant and changes with
time. There is still a telnetlib module in stdlib despite
the existence of ssh. I suspect the vast majority of
programmers are interested in a language that allows
them to *effectively* get done what they need to, whether
they are working of the latest agile TTD REST server, or
modifying some legacy text files. What I for one *don't*
need is to have my programming language enforcing its
idea of CS political correctness on me.
Secondly, the disparity in ease of use of an alternate
encoding on sts.stdout is not really between utf8
and non-utf8, it is between a default encoding (which
may be non-utf8), and the encoding I wish to use. So
one can't really attribute it to a desire to improve
the world by making non-utf8 harder to use!
And even were I to accept your argument, Python is
inconsistent: when I open a file explicitly there is
only a slight penalty for opening a non-default-encoded
file (the need the explicitly give an encoding):
f = open ("myfile", "w") # my default utf8 encoding
print ("text string", file=f)
vs
f = open ("myfile", "w", encoding="sjis") # non-utf8
print ("text string", file=f)
But for sys.stdout, the penalty for using an alternate
encoding is to google around for a solution (which may
not be optimal as Victor Stinner pointed out) and then
read about codecs and the StreamWriter wrapper, textio
wrappers and the .buffer() method. And the reading part
is then repeated by all those (at the same level of python
expertise) who read the program.
All I can do is repeat what I said before: non-utf8
codings exist and are widely used. That's a simple
fact. Sample some .jp web sites and look at the ratio
of shift-jis web pages to utf-8 web pages for example.
utf-8 is an encoding. shift-jis is an encoding. Sure,
I understand that utf-8 is preferable and I will use it
when possible. The fact that I am writing shift-jis means
that utf-8 *isn't* possible in this case.
Since utf-8 and shift-jis are both encodings and are equivalent
from a coding viewpoint (a simple choice of which codec to use)
the discrepancy in ease of use between the two in the case of
writing to the standard streams is not justifiable and should
be corrected if possible.
> And I don't think it would be wise to try to do it in a very general
> way, because it's very hard to do that at the general level of the
> language.
But is it? Or are you referring to switching encoding
on-the-fly? (see below).
> > I understand there is no support here for providing a non-
> > obscure, programmatic way of changing the encoding of the
> > standard streams at program startup
>
> You're wrong. There is *some* support for that.
>
> It just has to be done safely, and that means that a generic
> .set_encoding() method that can be called after I/O has been performed
> probably isn't going to happen.
There are two sub-threads in this discussion
1) Providing a more convenient and discoverable way to
programmatically change the encoding of std* streams
before first use.
2) Changing the encoding used on the std* stream or
any textio stream on the fly as a generalization of (1).
I thought I made clear I was advocating for (1) and
not (2) when I earlier wrote in reply to you:
> You are correct that my current concern is reinitializing
> the encoding(s) of the sys.std* streams prior to doing any
> operations with them.
and to MRAB:
> Disclaimer: As I said before, I am not particularly
> advocating for a for a set_encoding() method -- my
> primary suggestion is a programatic way to change the
> sys.std* encodings prior to first use.
As for (2), you have pointed out some potential issues with
switching encodings midstream. I don't understand how codecs
work in Python sufficiently yet to either agree or disagree
with you. I have however questioned some of the statements
made regarding its difficulty (and am holding my opinion
open until I understand the issues better), but I am not
(as I've stated) advocating for it now.
Sorry if I failed to make the distinction clearer. My use
of .set_encoding() as a placeholder for both ideas probably
contributed to the confusion.
> And it might not happen at the core level, since a 3-line function can
> do the job, it might make just as much sense to put up a package on
> PyPI.
I wasn't suggesting a change to the core level (if by that
you mean to the interpreter). I was asking if some way could
be provided that is easier and more reliable than googling
around for a magic incantation) to change the encoding of one
or more of the already-open-when-my-program-starts sys.std*
streams. I presume that would be a standard library change
(in either the io or sys modules) and offered a .set_encoding()
method as a placeholder for discussion.
I hardly think it is worth the effort, for either the producer
or consumers, of putting a 3-line function on PyPI. Nor would
such a solution address the discoverability and ease-of-use
problems I am complaining about.
An inferior and bare minimum way to address this would be to
at least add a note about how to change the encoding to the
sys.std* documentation. That encourages cargo-cult programming
and doesn't address the WTF effect but it is at least better
than the current state of affairs.
More information about the Python-ideas
mailing list