Python 3 is killing Python
steve at pearwood.info
Thu Jul 17 04:51:56 CEST 2014
On Wed, 16 Jul 2014 19:20:14 +0300, Marko Rauhamaa wrote:
> Chris Angelico <rosuav at gmail.com>:
>> The only thing that might be an issue is that you can't use open(fn) to
>> read your files, but you have to explicitly state the encoding. That
>> would be an understandable problem, especially for someone who develops
>> on a single platform and forgets that the default differs. As long as
>> you always explicitly say encoding="utf-8", and document that you do
>> so, any problems are someone else's.
> Yes. I don't like open() guessing the enconding:
It doesn't *guess*. It has a sensible default encoding which, for most
users most of the time, does the right thing. Ultimately though, the
encoding is under your control: you can specify it if you think you know
> The default encoding is platform dependent (whatever
> locale.getpreferredencoding() returns)
Right. Most text files will be written using the preferred encoding,
unless the user explicitly uses something else when writing the file. In
that case it's the user's responsibility. Or if they've got the file from
another system with a different encoding. But even then, the most common
encodings are ASCII-compatible, which means that the lowest common
denominator case (reading and writing ASCII files) will Just Work.
>From a purity stand-point, no, open() shouldn't have a default encoding,
and the user should have to specify it. But what makes you imagine that
the user will know the correct encoding better than Python does? The
average coder shouldn't have to care about encodings just to do
file.write("Hello World"), and on the average computer they don't have to
because Python sets a sensible default.
But you know what? From a purity stand-point, *even binary mode* assumes
an encoding of sorts. How do you know that binary files on your platform
use eight-bit bytes? Some DSPs use 9-bit bytes, and historically
computers had as few as 6 or as many as 60 bits per byte. This is why the
C standard requires that a byte is *at least* 8 bits.
But, having said that, the assumption that binary files are based on 8-
bit bytes is pretty safe. It would be foolish to force the majority of
people, who don't need to care about these sorts of details, to care
about them just to suit the one in ten-thousand who do.
Likewise with text files. Python makes sensible defaults which will suit
most people, rather than force people to guess the wrong encoding. But
it's only a default, you can explicitly set it if you believe the file in
question uses a different encoding.
> In each case, it would have been better to default to bytes just like
> subprocess does.
Better for whom? You? Maybe. For the typical programmer that Python is
designed for? Hell no.
 Lets be honest, there still is a bias towards English and ASCII in
computing, and probably this will remain the case until English ceases to
be a de facto lingua franca. Most programming languages are written for
J. Random Hacker, not Jランダムハッカー.
More information about the Python-list