[Python-ideas] Fix default encodings on Windows

Steve Dower steve.dower at python.org
Thu Aug 18 09:23:16 EDT 2016


"You consistently ignore Makefiles, .ini, etc."

Do people really do open('makefile', 'rb'), extract filenames and try to use them without ever decoding the file contents?

I've honestly never seen that, and it certainly looks like the sort of thing Python 3 was intended to discourage. (As soon as you open(..., 'r') you're only affected by this change if you explicitly encode again with mbcs.)

Top-posted from my Windows Phone

-----Original Message-----
From: "Stephen J. Turnbull" <turnbull.stephen.fw at u.tsukuba.ac.jp>
Sent: ‎8/‎17/‎2016 19:43
To: "Steve Dower" <steve.dower at python.org>
Cc: "Paul Moore" <p.f.moore at gmail.com>; "Python-Ideas" <python-ideas at python.org>
Subject: Re: [Python-ideas] Fix default encodings on Windows

Steve Dower writes:
 > On 17Aug2016 0235, Stephen J. Turnbull wrote:

 > > So a full statement is, "How do we best represent Windows file
 > > system paths in bytes for interoperability with systems that
 > > natively represent paths in bytes?"  ("Other systems" refers to
 > > both other platforms and existing programs on Windows.)
 > 
 > That's incorrect, or at least possible to interpret correctly as
 > the wrong thing. The goal is "code compatibility with systems ...",
 > not interoperability.

You're right, I stated that incorrectly.  I don't have anything to add
to your corrected version.

 > > In a properly set up POSIX locale[1], it Just Works by design,
 > > especially if you use UTF-8 as the preferred encoding.  It's
 > > Windows developers and users who suffer, not those who wrote the
 > > code, nor their primary audience which uses POSIX platforms.
 > 
 > You mentioned "locale", "preferred" and "encoding" in the same sentence, 
 > so I hope you're not thinking of locale.getpreferredencoding()? Changing 
 > that function is orthogonal to this discussion,

You consistently ignore Makefiles, .ini, etc.  It is *not* orthogonal,
it is *the* reason for all opposition to your proposal or request that
it be delayed.  Filesystem names *are* text in part because they are
*used as filenames in text*.

 > When Windows developers and users suffer, I see it as my responsibility 
 > to reduce that suffering. Changing Python on Windows should do that 
 > without affecting developers on Linux, even though the Right Way is to 
 > change all the developers on Linux to use str for paths.

I resent that.  If I were a partisan Linux fanboy, I'd be cheering you
on because I think your proposal is going to hurt an identifiable and
large class of *Windows* users.  I know about and fear this possiblity
because they use a language I love (Japanese) and an encoding I hate
but have achieved a state of peaceful coexistence with (Shift JIS).

And on the general principle, *I* don't disagree.  I mentioned earlier
that I use only the str interfaces in my own code on Linux and Mac OS
X, and that I suspect that there are no real efficiency implications
to using str rather than bytes for those interfaces.

On the other hand, the programming convenience of reading the
occasional "text" filename (or other text, such as XML tags) out of a
binary stream and passing it directly to filesystem APIs cannot be
denied.  I think that the kind of usage you propose (a fixed,
universal codec, universally accepted; ie, 'utf-8') is the best way to
handle that in the long run.  But as Grandmaster Lasker said, "Before
the end game, the gods have placed the middle game."  (Lord Keynes
isn't relevant here, Python will outlive all of us. :-)

 > I don't think there's any reasonable way to noisily deprecate these
 > functions within Python, but certainly the docs can be made
 > clearer. People who explicitly encode with
 > sys.getfilesystemencoding() should not get the deprecation message,
 > but we can't tell whether they got their bytes from the right
 > encoding or a RNG, so there's no way to discriminate.

I agree with you within Python; the custom is for DeprecationWarnings
to be silent by default.

As for "making noise", how about announcing the deprecation as like
the top headline for 3.6, postponing the actual change to 3.7, and in
the meantime you and Nick do a keynote duet at PyCon?  (Your partner
could be Guido, too, but Nick has been the most articulate proponent
for this particular aspect of "inclusion".  I think having a
representative from the POSIX world explaining the importance of this
for "all of us" would greatly multiply the impact.)  Perhaps, given my
proposed timing, a discussion at the language summit in '17 and the
keynote in '18 would be the best timing.

(OT, political: I've been strongly influenced in this proposal by
recently reading http://blog.aurynn.com/contempt-culture.  There's not
as much of it in Python as in other communities I'm involved in, but I
think this would be a good symbolic opportunity to express our
oppostion to it.  "Inclusion" isn't just about gender and race!)

 > I'm going to put together a summary post here (hopefully today) and get 
 > those who have been contributing to basically sign off on it, then I'll 
 > take it to python-dev. The possible outcomes I'll propose will basically 
 > be "do we keep the status quo, undeprecate and change the functionality, 
 > deprecate the deprecation and undeprecate/change in a couple releases, 
 > or say that it wasn't a real deprecation so we can deprecate and then 
 > change functionality in a couple releases".

FWIW, of those four, I dislike 'status quo' the most, and like 'say it
wasn't real, deprecate and change' the best.  Although I lean toward
phrasing that as "we deprecated it, but we realize that practitioners
are by and large not aware of the deprecation, and nobody expects the
Spanish Inquisition".

@Nick, if you're watching: I wonder if it would be possible to expand
the "in the file system, bytes are UTF-8" proposal to POSIX as well,
perhaps for 3.8?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160818/2e8cf7cc/attachment-0001.html>


More information about the Python-ideas mailing list