
"You consistently ignore Makefiles, .ini, etc." Do people really do open('makefile', 'rb'), extract filenames and try to use them without ever decoding the file contents? I've honestly never seen that, and it certainly looks like the sort of thing Python 3 was intended to discourage. (As soon as you open(..., 'r') you're only affected by this change if you explicitly encode again with mbcs.) Top-posted from my Windows Phone -----Original Message----- From: "Stephen J. Turnbull" <turnbull.stephen.fw@u.tsukuba.ac.jp> Sent: 8/17/2016 19:43 To: "Steve Dower" <steve.dower@python.org> Cc: "Paul Moore" <p.f.moore@gmail.com>; "Python-Ideas" <python-ideas@python.org> Subject: Re: [Python-ideas] Fix default encodings on Windows Steve Dower writes:
On 17Aug2016 0235, Stephen J. Turnbull wrote:
So a full statement is, "How do we best represent Windows file system paths in bytes for interoperability with systems that natively represent paths in bytes?" ("Other systems" refers to both other platforms and existing programs on Windows.)
That's incorrect, or at least possible to interpret correctly as the wrong thing. The goal is "code compatibility with systems ...", not interoperability.
You're right, I stated that incorrectly. I don't have anything to add to your corrected version.
In a properly set up POSIX locale[1], it Just Works by design, especially if you use UTF-8 as the preferred encoding. It's Windows developers and users who suffer, not those who wrote the code, nor their primary audience which uses POSIX platforms.
You mentioned "locale", "preferred" and "encoding" in the same sentence, so I hope you're not thinking of locale.getpreferredencoding()? Changing that function is orthogonal to this discussion,
You consistently ignore Makefiles, .ini, etc. It is *not* orthogonal, it is *the* reason for all opposition to your proposal or request that it be delayed. Filesystem names *are* text in part because they are *used as filenames in text*.
When Windows developers and users suffer, I see it as my responsibility to reduce that suffering. Changing Python on Windows should do that without affecting developers on Linux, even though the Right Way is to change all the developers on Linux to use str for paths.
I resent that. If I were a partisan Linux fanboy, I'd be cheering you on because I think your proposal is going to hurt an identifiable and large class of *Windows* users. I know about and fear this possiblity because they use a language I love (Japanese) and an encoding I hate but have achieved a state of peaceful coexistence with (Shift JIS). And on the general principle, *I* don't disagree. I mentioned earlier that I use only the str interfaces in my own code on Linux and Mac OS X, and that I suspect that there are no real efficiency implications to using str rather than bytes for those interfaces. On the other hand, the programming convenience of reading the occasional "text" filename (or other text, such as XML tags) out of a binary stream and passing it directly to filesystem APIs cannot be denied. I think that the kind of usage you propose (a fixed, universal codec, universally accepted; ie, 'utf-8') is the best way to handle that in the long run. But as Grandmaster Lasker said, "Before the end game, the gods have placed the middle game." (Lord Keynes isn't relevant here, Python will outlive all of us. :-)
I don't think there's any reasonable way to noisily deprecate these functions within Python, but certainly the docs can be made clearer. People who explicitly encode with sys.getfilesystemencoding() should not get the deprecation message, but we can't tell whether they got their bytes from the right encoding or a RNG, so there's no way to discriminate.
I agree with you within Python; the custom is for DeprecationWarnings to be silent by default. As for "making noise", how about announcing the deprecation as like the top headline for 3.6, postponing the actual change to 3.7, and in the meantime you and Nick do a keynote duet at PyCon? (Your partner could be Guido, too, but Nick has been the most articulate proponent for this particular aspect of "inclusion". I think having a representative from the POSIX world explaining the importance of this for "all of us" would greatly multiply the impact.) Perhaps, given my proposed timing, a discussion at the language summit in '17 and the keynote in '18 would be the best timing. (OT, political: I've been strongly influenced in this proposal by recently reading http://blog.aurynn.com/contempt-culture. There's not as much of it in Python as in other communities I'm involved in, but I think this would be a good symbolic opportunity to express our oppostion to it. "Inclusion" isn't just about gender and race!)
I'm going to put together a summary post here (hopefully today) and get those who have been contributing to basically sign off on it, then I'll take it to python-dev. The possible outcomes I'll propose will basically be "do we keep the status quo, undeprecate and change the functionality, deprecate the deprecation and undeprecate/change in a couple releases, or say that it wasn't a real deprecation so we can deprecate and then change functionality in a couple releases".
FWIW, of those four, I dislike 'status quo' the most, and like 'say it wasn't real, deprecate and change' the best. Although I lean toward phrasing that as "we deprecated it, but we realize that practitioners are by and large not aware of the deprecation, and nobody expects the Spanish Inquisition". @Nick, if you're watching: I wonder if it would be possible to expand the "in the file system, bytes are UTF-8" proposal to POSIX as well, perhaps for 3.8?