On Tue, 9 Feb 2021 at 16:54, Inada Naoki <songofacandy@gmail.com> wrote:
On Tue, Feb 9, 2021 at 9:31 PM Paul Moore <p.f.moore@gmail.com> wrote:
Personally, I'm not at all keen on the idea of making users always specify encoding in the first place, even if it's "just for the transition".
I agree with you. But as I wrote in the PEP, omitted encoding caused much troubles already. Windows users can not just `pip install somepkg` because some library authors write `long_description=open("README.md").read()` in setup.py.
I am trying to fix this situation by two parallel approaches:
* (This PEP) Provide a tool for finding this type of bugs, and recommend `encoding="utf-8"` for cross-platform library authors. * (Author thread) Make UTF-8 mode more usable for Windows users, especially students.
Thanks for explaining (again). There's so much debate, across multiple proposals, that I can barely follow it. I'm impressed that you're managing to keep things straight at all :-) I guess my views on this PEP come down to * I see no harm in having a tool that helps developers spot platform-specific assumptions about encoding. * Realistically, I'd be surprised if developers actually use such a tool. If they were likely to do so, they could probably just as easily locate all the uses of open() in their code, and check that way. So I'm not sure this proposal is actually worth it, even if the end result would be very beneficial. * In the setup.py case, why don't those same Windows users complain that the library fails to install? A quick bug report, followed by a simple fix, seems more likely to happen than the developer suddenly deciding to scan their code for encoding issues. Regarding the wider question of UTF8 as default, my views can probably be summarised as follows: * If you want to write correct code to deal with encodings, there is no substitute for carefully considering every bytes/string conversion, deciding how you are going to identify the encoding to use, and then specifying that encoding explicitly. Default values for encodings have no place in such code. * In reality, though, that's far too much work for many situations. Default encodings are a necessary convenience, particularly for simple scripts, or for people who can't, or don't want to, do the analysis that the "correct" approach implies. * Picking the right default is *hard*. Changing the default is even harder, unfortunately. * I feel that we already have a number of mechanisms (PEPs 538 and 540) trying to tackle this issue. Adding yet more suggests to me that we'd be better off pausing and working out why we still have an issue. We should be moving towards *fewer* mechanisms, not more. * We have UTF-8 mode, and users can set it per-process (via flag or environment variable) per-user or per-site (by environment variable). I don't honestly believe that a user (whatever OS they work on) who is capable of writing Python code, can't be shown how to set an environment variable. I see no reason to suggest we need yet another way to set UTF-8 mode, or that a per-interpreter or per-virtualenv setting is particularly crucial (suggestions that have been made in the Python-Ideas threads). * UTF-8 is likely to be the most appropriate default encoding for Python in the longer term, and I agree that Windows is fast approaching the point where a UTF-8 encoding is more appropriate than the ANSI codepage for "new stuff". But there's a lot of legacy files and applications around, and I suspect that a UTF-8 default will inconvenience a lot of people working with such data. But equally, such people may not be in a huge rush to switch to the latest Python version. Whichever way we go, though, some people will be inconvenienced. I'm also somewhat bemused by the rather negative view of "Windows beginners" that lies behind a lot of these discussions. People's experiences may well differ, but the people I see using (and learning) Python on Windows are often experienced computer users, maybe developers with significant experience in Java or other "enterprise languages", or data scientists who have a lot of knowledge of computers, but are relatively new to programming. Or systems admins, or database specialists, who want to use Python to write scripts on Windows. None of those people fit the picture of people who wouldn't know how to set an environment variable, or configure their environment. On the other hand, (in my experience) they often don't really have much knowledge of character encodings, and tend to just use whatever default their PC uses, and expect it to work. They *can*, however, understand when an encoding problem is explained to them, and can set an explicit encoding once they know they need to. Paul