[Python-ideas] changing sys.stdout encoding

Sat Jun 9 05:39:34 CEST 2012

On 06/07/2012 03:00 PM, Mike Meyer wrote:
> On Thu, Jun 7, 2012 at 4:48 PM, Rurpy <rurpy-/E1597aS9LQAvxtiuMwx3w at public.gmane.org> wrote:
>> I suspect the vast majority of
>> programmers are interested in a language that allows
>> them to *effectively* get done what they need to, whether
>> they are working of the latest agile TTD REST server, or
>> modifying some legacy text files.
> 
> Others have raised the question this begs to have answered: how do
> other programming languages deal with wanting to change the encoding
> of the standard IO streams? Can you show us how they do things that's
> so much easier than what Python does?

This is how it seems to be done in Perl: 

 binmode(STDOUT, ":encoding(sjis)");

which seems quite a bit simpler than Python.  I don't
know if it meets your "so much easier" criterion.
A quick trial showed that it works as advertised when
called before any output.  The description of binmode() 
in "man perlfunc" sounds like encoding can be changed 
on-the-fly but my attempt to do so had no effect, so I 
don't know if I'm misinterpreting the text or wrote bad 
Perl code (haven't used it in ages and not interested 
in relearning it right now.)

TCL appears to have on-the-fly encoding changes:

 | encoding system ?encoding?
 |  Set the system encoding to encoding. If encoding is omitted
 |  then the command returns the current system encoding. The system
 |  encoding is used whenever Tcl passes strings to system calls. 
 http://www.tcl.tk/man/tcl8.4/TclCmd/encoding.htm

I'll see if I can find out about some other languages 
if there continues to be any interest.

>> And even were I to accept your argument, Python is
>> inconsistent: when I open a file explicitly there is
>> only a slight penalty for opening a non-default-encoded
>> file (the need the explicitly give an encoding):
> 
> The proper encoding for the standard IO streams is generally a
> property of the environment, and hence is set in the environment.

"Proper encoding"?  If you said, "Proper default encoding" 
I'd agree with you.  And I'd buy your claim if no one had 
ever invented output redirection and if print output always 
went to a console with a (relatively) fixed encoding.  But 
that is not the case.

> You
> have a use case where that's not the case. The argument is that your
> use case isn't common enough to justify changing the standard library.
> Can you provide evidence to the contrary? 

How exactly do you suggest one accurately quantify 
"commonness"?  And what is the threshold for justification?
It seems to me the strongest argument is the credibility
one that I already made:

1) Programs that accept data input on stdin and write
 data on stdout have a long history and are widely used.
 I hope this is self evident.

2) Encodings other than utf-8 are widely used.  I pointed
 to the commonness of non-utf8 encoding in Japanese web
 pages.  Additionally, Google for "ftp readme の site:.jp"
 turns up lots of text files.  Once past the first few 
 pages of Google results (where the web pages are mostly
 utf8) hardly any utf8 files are to be found.

3) An effect of globalization means that many more
 programmers today are dealing with files that have 
 non-native encoding that come from or go to customers,
 vendors, partners and colleagues in other parts of the
 world.  The number of encodings in wide use even within
 a single country (again Japan: utf8, sjis, euc-jp,
 iso202jp) implies pretty strongly that tools for use
 only in that region will often need multi-encoding
 capabilities.

I think connecting the dots above leads to a pretty
high-probability conclusion.

> Other languages that make
> setting the encoding on the standard streams easy, or applications
> outside of those built for your system that have a "--encoding" type
> flag?

iconv, recode and their ilk are obvious examples of 
applications.

>> I wasn't suggesting a change to the core level (if by that
>> you mean to the interpreter).  I was asking if some way could
>> be provided that is easier and more reliable than googling
>> around for a magic incantation) to change the encoding of one
>> or more of the already-open-when-my-program-starts sys.std*
>> streams.  I presume that would be a standard library change
>> (in either the io or sys modules) and offered a .set_encoding()
>> method as a placeholder for discussion.
> 
> Why presume that this needs a change in the library? The method is
> straightforward, if somewhat ugly. Is there any reason it can't just
> be documented, instead of added to the library? Changing the library
> would require a similar documentation change.

Did you miss the paragraph right below the one you quote?
The one in which I said, 

  >> An inferior and bare minimum way to address this would be to 
  >> at least add a note about how to change the encoding to the 
  >> sys.std* documentation.  That encourages cargo-cult programming 
  >> and doesn't address the WTF effect but it is at least better 
  >> than the current state of affairs.