On Fri, Jan 29, 2021 at 12:54 PM Ben Rudiak-Gould
On Wed, Jan 27, 2021 at 11:36 PM Inada Naoki
wrote: * UnicodeDecodeError is raised when trying to open a text file written in UTF-8, such as JSON. * UnicodeEncodeError is raised when trying to save text data retrieved from the web, etc. * User run `pip install` and `setup.py` reads README.md or LICENSE file written in UTF-8 without `encoding="UTF-8"`
Users can use UTF-8 mode to solve these problems.
They can use it to solve *those* problems, but probably at the cost of creating different problems.
There's a selection bias here, because you aren't seeing cases where a script worked because the default encoding was the correct one. If you switch a lot of ordinary users (not power users who already use it) to UTF-8 mode, I think a lot of scripts that currently work will start failing, or worse, silently producing bogus output that won't be understood by a downstream tool. I'm not convinced this wouldn't be a bigger problem than the problem you're trying to solve.
I understand it so I proposed per-install UTF-8 mode. User can set PYTHONUTF8=1 user environment variable for now. But it may break existing applications. My proposal is per-environment UTF-8 mode. When user want to install new Python to learn Python, they can enable UTF-8 mode only for the new Python environment without breaking existing applications.
* Put a checkbox in the installer?
Do I want Python to assume that everything is UTF-8? Probably not.
Even you don't want, many developers assume default is always UTF-8 already.
And you can enable UTF-8 mode only in one venv to run such code, if
UTF-8 mode can be enabled by pyvenv.cfg.
--
Inada Naoki