There's a long ongoing thread with the subject "Make UTF-8 mode more accessible for Windows users."
There are two obvious problems with UTF-8 mode.
If you don't think UTF-8 mode is helpful then don't use it -- and maybe join that thread and argue that it should NOT be more accessible.
First, it applies to entire installations, or at least entire running scripts, including all imported libraries no matter who wrote them, etc., making it a blunt instrument.
yes, indeed, that is the case, and why that other thread has substantial discussion.
Second, the problem on Windows isn't that Python happens to use the wrong default encoding, it's that multiple encodings coexist, and you really do have to think each time you en/decode something about which encoding you ought to use.
That's the problem with all of Unicode, on all systems -- nothing Windows specific about it.
UTF-8 mode doesn't solve that, it just changes the default.
Not quite -- what UTF-8 mode does is make Python act like it does on virtually every operating system. That is, the default encoding is utf-8, everywhere, every time, regardless of how the system the code is running on is configured.
Which solves a substantial problem. and why the goal is for Python eventually to be utf-8 default everywhere on all systems.
Frankly, the idea of Python, which is a programming language / runtime environment to use a system setting for text file encoding is a really bad idea. In this age of the internet, the idea that a text file is most likely to be encoded in the same encoding as the system default of the machine it happens to run on is just plain wrong.
And it leads to real problems because code that that works just fine on one machine may not work right on another -- on not just "tested on Linux, broken on Windows", but "tested on one Windows machine broken on another"
Water under the bridge, but it will take a long time to change the Python defaults, so UTF mode provides a transition: application developers can say that this code will work the same way on all machines if you use UTF-8 mode.
Yes, the "right" way to achieve that is to specify an encoding for all text files, but if you, as an application developer, are using packages written by others that may be broken in that way -- you're kind of stuck.
It seems as though most of those commenting in the other thread don't actually use Python on Windows. I do, and
I'm one of those "don't use Windows (or not much) -- but I do write software that I want others to be able to run on Windows.
>I can say it's a royal pain to have to write open(path, encoding='utf-8') all the time.
Indeed -- and EVERYONE should be doing that, on all OS's if you want your code to be cross platform. ANy many (most?) don't -- again, that's why UTF-8 mode is useful.
If you could write open(path, 'r', 'utf-8'), that would be slightly better, but the third parameter is buffering, not encoding, and open(path, 'r', -1, 'utf-8') is not very readable.
UTF-8 mode is somehow worse, because you now have to decide between writing open(path), and having your script be incompatible with non-UTF-8 Windows installations,
I personally think that using the "system" encoding is probably never the right choice, but if it is for an application, then what we need is a "system" encoding, as proposed in PEP 597. I think we need that before pushing greater use of UTF-8 mode.
I do agree that making it easier to set the encoding would be good in principle, but that most direct way to solve this problem is to make the default utf-8 everywhere, as it already is in most code, and as wrong as it is, a lot of code is already making that assumption.
There's a constant temptation to sacrifice portability for convenience - a temptation that Unix users are familiar with, since they omit encoding='utf-8' all the time.
true, but I think many, if not most, folks do not know that they are making that choice, but rather, not thinking about it, and when it works most of the time, then they're done (I'm sure guilty of that!).
Anyway, I think others have said everything I'd say about your specific suggestions, but in short -- yes, it would have been good to make encoding specification easier, but too late now, and if we are making any changes, they should be PEP 597 and ultimately making the default utf-8.
- Chris B.
--