[Python-ideas] changing sys.stdout encoding
rurpy at yahoo.com
Sat Jun 9 05:39:34 CEST 2012
On 06/07/2012 03:00 PM, Mike Meyer wrote:
> On Thu, Jun 7, 2012 at 4:48 PM, Rurpy <rurpy-/E1597aS9LQAvxtiuMwx3w at public.gmane.org> wrote:
>> I suspect the vast majority of
>> programmers are interested in a language that allows
>> them to *effectively* get done what they need to, whether
>> they are working of the latest agile TTD REST server, or
>> modifying some legacy text files.
> Others have raised the question this begs to have answered: how do
> other programming languages deal with wanting to change the encoding
> of the standard IO streams? Can you show us how they do things that's
> so much easier than what Python does?
This is how it seems to be done in Perl:
which seems quite a bit simpler than Python. I don't
know if it meets your "so much easier" criterion.
A quick trial showed that it works as advertised when
called before any output. The description of binmode()
in "man perlfunc" sounds like encoding can be changed
on-the-fly but my attempt to do so had no effect, so I
don't know if I'm misinterpreting the text or wrote bad
Perl code (haven't used it in ages and not interested
in relearning it right now.)
TCL appears to have on-the-fly encoding changes:
| encoding system ?encoding?
| Set the system encoding to encoding. If encoding is omitted
| then the command returns the current system encoding. The system
| encoding is used whenever Tcl passes strings to system calls.
I'll see if I can find out about some other languages
if there continues to be any interest.
>> And even were I to accept your argument, Python is
>> inconsistent: when I open a file explicitly there is
>> only a slight penalty for opening a non-default-encoded
>> file (the need the explicitly give an encoding):
> The proper encoding for the standard IO streams is generally a
> property of the environment, and hence is set in the environment.
"Proper encoding"? If you said, "Proper default encoding"
I'd agree with you. And I'd buy your claim if no one had
ever invented output redirection and if print output always
went to a console with a (relatively) fixed encoding. But
that is not the case.
> have a use case where that's not the case. The argument is that your
> use case isn't common enough to justify changing the standard library.
> Can you provide evidence to the contrary?
How exactly do you suggest one accurately quantify
"commonness"? And what is the threshold for justification?
It seems to me the strongest argument is the credibility
one that I already made:
1) Programs that accept data input on stdin and write
data on stdout have a long history and are widely used.
I hope this is self evident.
2) Encodings other than utf-8 are widely used. I pointed
to the commonness of non-utf8 encoding in Japanese web
pages. Additionally, Google for "ftp readme の site:.jp"
turns up lots of text files. Once past the first few
pages of Google results (where the web pages are mostly
utf8) hardly any utf8 files are to be found.
3) An effect of globalization means that many more
programmers today are dealing with files that have
non-native encoding that come from or go to customers,
vendors, partners and colleagues in other parts of the
world. The number of encodings in wide use even within
a single country (again Japan: utf8, sjis, euc-jp,
iso202jp) implies pretty strongly that tools for use
only in that region will often need multi-encoding
I think connecting the dots above leads to a pretty
> Other languages that make
> setting the encoding on the standard streams easy, or applications
> outside of those built for your system that have a "--encoding" type
iconv, recode and their ilk are obvious examples of
>> I wasn't suggesting a change to the core level (if by that
>> you mean to the interpreter). I was asking if some way could
>> be provided that is easier and more reliable than googling
>> around for a magic incantation) to change the encoding of one
>> or more of the already-open-when-my-program-starts sys.std*
>> streams. I presume that would be a standard library change
>> (in either the io or sys modules) and offered a .set_encoding()
>> method as a placeholder for discussion.
> Why presume that this needs a change in the library? The method is
> straightforward, if somewhat ugly. Is there any reason it can't just
> be documented, instead of added to the library? Changing the library
> would require a similar documentation change.
Did you miss the paragraph right below the one you quote?
The one in which I said,
>> An inferior and bare minimum way to address this would be to
>> at least add a note about how to change the encoding to the
>> sys.std* documentation. That encourages cargo-cult programming
>> and doesn't address the WTF effect but it is at least better
>> than the current state of affairs.
More information about the Python-ideas