On Fri, Feb 12, 2021 at 5:18 AM Jim J. Jewett
Inada Naoki wrote:
Default encoding is used for:
a. Really need to use locale specific encoding b. UTF-8 (bug. not work on Windows) c. ASCII (not a bug, but slow on Windows)
I assume most usages are (b) and (c). This PEP can reduce them soon.
Is this just an assumption, based on those times being visible to someone who installs a lot of packages, or has the use of any locale other than UTF-8 and ASCII really gone down a lot? Have browsers stopped using charset sniffing?
Using "most" is my fault. I am not good at Englsh. I should use "many" here. You can see many bugs caused by not specifying `encoding="utf-8"` in Q&A sites. I wrote some number about this common bugs in the PEP. UTF-8 is used for 96.3% of web sites [1], although browser still use charset sniffing. But how is it relating to this PEP? [1] https://w3techs.com/technologies/details/en-utf8
Additionally, encoding="locale" will be backward/forward compatible
What would be the problem with changing the default from None to locale?
It doesn't work on Python ~3.9. So using `encoding="locale"` is not recommended anytime soon until user drops Python 3.9 support.
(I think you mentioned that they are the same 99% of the time; is that other 1% likely to be cases where locale is wrong but None is right? Would there be a better way to represent that 1%?)
`encoding="locale"` and `encoding=None` has same behavior except
`encoding="locale"` doesn't emit EncodingWarning even when it is
opt-in.
There is little difference between `encoding=None` and
`encoding=locale.getpreferredencoding(False)`. The difference is:
* When Python is using Windows, and
* When when the file is console, and
* (for open()) When PYTHONLEGACYWINDOWSSTDIO is set
* (for TextIOWrapper()) When the file is not _WindowsConsoleIO
encoding=None uses console codepage but
encoding=locale.getpreferredencoding(False) uses
Otherwise, encoding=None and
encoding=locale.getpreferredencoding(False) are same.
So `encoding=locale.getpreferredencoding(False)` can be used to
specify locale-specific encoding explicitly.
But this PEP doesn't recommend it. This PEP recommend to use
EncodingWarning for just finding missing `encoding="utf-8"` (or any
other specific encoding).
--
Inada Naoki