Thanks for working so hard to move this forward!

The "real" solution is to change the defaults not to use the system encoding at all -- which, of course, we are moving towards with PEP 597. So first a plug to do that as fast as possible! I myself would love to see PEP 597 implemented tomorrow -- for all supported versions of Python.

However, the real trick here is that Python is a programming language/library/runtime -- not an application. So the folks starting up the interpreter are very often NOT the same as the folks writing the code.

And this is why this is the issue it is -- folks write code on *nix systems, or maybe Windows with utf-8 as a system encoding, or only test with ASCII data, or ...  -- then someone else actually runs the code, on Windows, and it doesn't work. Even if the person is technically writing the code, they may have copy and pasted it or who knows what? Think about it -- of all the Python code you run (libraries, etc) -- how much of it did you write yourself?

(I myself have been highly negligent with my teaching materials in this regard --  so have personally unleashed dozens of folks writting buggy code on the world.)

Anyway -- I'm afraid any combination of start-up flags, environment variables, etc. will not be enough -- is there a way to enable UTF-8 mode in the code, e.g. with a __future__ import?

This may be impossible, as UTF-8 modeis an interpreter global setting, and it could get very messy if a __future import__ in one library changes the behavior of all the other code -- but maybe there's some way to accomplish something similar?

from __future__ import utf8_mode

Could monkey patch open() for that module, but would there be any way to have it work, on a module basis, for all other uses of TextIOWrapper?

Maybe one work around would be for the __future__ import (Or something) to set the mode, and then trigger warnings for all uses of TextIOWrapper that don't use utf-8 -- that us turn on PEP597

So you'd use one library that had the __future__ import, and it wouldn't break any other code,  but it would turn on Warnings.

Anyway, this is a very hard problem, but what I'm trying to get at is that we don't want the exact same code to run differently depending on what environment it's running in. Currently, it depends on the system encoding, we'd just be switching to it depending on whether utf-mode is turned on, which is better, I suppose, (e.g Jupyter could choose to turn utf-mode on by default for example), but would still have the same fundamental problem.

Imagine someone runs some code in Jupyter, and it's fine, and then they run it in plain Python, on the same machine, and it breaks -- ouch!

BTW: is there a way at runtime to check for UTF8 mode? Then at least I could raise a warning in my code. Or maybe simply check if locale.getpreferredencoding() returns utf-8, and raise a warning if not. That wouldn't be hard to do, but it might be worth having a small utility that does it in a _future__import:

from __future__ import warn_if_not_utf8

On Wed, Jan 27, 2021 at 11:35 PM Inada Naoki <songofacandy@gmail.com> wrote:
 
Is it possible to enable UTF-8 mode in a configuration file like `pyvenv.cfg`?

I can't see how that's any more powerful/flexible than an environment variable.

Is it possible to make it easier to configure?

* Put a checkbox in the installer?
* Provide a small tool to allow configuration after installation?
  * python3 -m utf8mode enable|disable?
    * Accessible only for CLI user
      * Add "Enable UTF-8 mode" and "Disable UTF-8 mode" to Start menu?

This is still going to have the same fundamental problems of the same code running differently on different machines or even the same machine in different environments, installs -- someone upgrades and forgets to check that box again, etc ....

Maybe this would be a good thing to do once there are Warnings in place?

-CHB


Any ideas are welcome.

--
Inada Naoki  <songofacandy@gmail.com>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/LQVK2UKPSOI2AHYFUWK6ZII2U6QKK6BP/
Code of Conduct: http://python.org/psf/codeofconduct/


--
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython