On Thu, Jan 28, 2021 at 4:25 PM Inada Naoki <songofacandy@gmail.com> wrote:
> The "real" solution is to change the defaults not to use the system encoding at all -- which, of course, we are moving towards with PEP 597. So first a plug to do that as fast as possible! I myself would love to see PEP 597 implemented tomorrow -- for all supported versions of Python.
>

Note that PEP 597 doesn't change the default encoding. It just adds an
option to emit a warning when the default encoding is used.

I know -- and THAT could be done soon, yes?
 
I think it might take about 10 years to change it.

I hope it's not that long -- having code that runs differently in different environments is not good ...

> However, the real trick here is that Python is a programming language/library/runtime -- not an application. So the folks starting up the interpreter are very often NOT the same as the folks writing the code.
>
> And this is why this is the issue it is -- folks write code on *nix systems, or maybe Windows with utf-8 as a system encoding, or only test with ASCII data, or ...  -- then someone else actually runs the code, on Windows, and it doesn't work. Even if the person is technically writing the code, they may have copy and pasted it or who knows what? Think about it -- of all the Python code you run (libraries, etc) -- how much of it did you write yourself?
>
> (I myself have been highly negligent with my teaching materials in this regard --  so have personally unleashed dozens of folks writting buggy code on the world.)

Many codes are written by other people. It cause
UnicodeDecodeError on Windows.
And UTF-8 mode rescues it.

exactly. But the trick is that UTF-* mode is in control of the end user / installer of Python, not the writer of the code.
 
UTF-8 mode is used to decode command-line arguments and environment
variables on Unix. So UTF-8 mode can be enabled only at startup for
now.
This restriction is caused by Unix so I think we can add something
like `sys._enable_utf8_mode()` only on Windows if it is really needed.
But it means codes using `sys._enable_utf8_mode()` are Windows-only.
It doesn't make sense.

well, that would be a no-op on other platforms.
 
Another way is adding runtime option to change only the default text
encoding. (e.g. `io.set_default_encoding("utf-8")`)
This is a considerable option. When we add it on the top of scripts or
Notebook, it uses UTF-8 to open files on all platforms.

On the other hand, it adds another "xxx encoding" terminology to
Python. Python has too many "xxx encoding"s and it confuses users.
So I am cautious about adding another encoding option

I appreciate that -- but I do like handing control over to the code-writer, rather than the python-installer.
 
> Maybe one work around would be for the __future__ import (Or something) to set the mode, and then trigger warnings for all uses of TextIOWrapper that don't use utf-8 -- that us turn on PEP597
>
> So you'd use one library that had the __future__ import, and it wouldn't break any other code,  but it would turn on Warnings.

Please don't discuss PEP 597 in this thread. Let's focus on UTF-8 mode.
They are different approaches and they are not mutually exclusive.

Sure, but they are related. But I"ll try to find the right thread for PEP 597
 
> Imagine someone runs some code in Jupyter, and it's fine, and then they run it in plain Python, on the same machine, and it breaks -- ouch!

You are right. UTF-8 mode must be accessible for both of Jupyter on
conda Python and Python installed by official installer.
If UTF-8 mode is accessible enough, user can fix it by enabling UTF-8 mode.

Sure -- but these days folks may have multiple environments and multiple ways to run code (Jupyter, IDEs), so it's way too easy to have UTF-8 mode on in some but not others -- all on the same machine.

I'm not a Windows user (much), but users of my library are, and my students are, and I'm having a hard time figuring out what will make this work for them.

In the case of my students, I can encourage UTF-8 mode for all installations.

In the case of my library users -- it's harder, but I can do the same to some extent -- I do currently suggest a conda environment for my code -- so yes, making it easier to turn it on in an environment would be good.

Hmm -- sorry for thinking as I write here, but if UTF-8 mode could be part of an environment spec -- that would be good. 

So it there a way to have a package installed that turned it on? (obviously a no-op on other platfroms). So you would specify a dependency on the utf8_mode package, At run time, if the utf8_mode package was installed, then UTF-8 mode would be turned on.

So that wouldn't quite put it in the hands of the coder -- but would put it in the hands of the application developer -- the person writing the requirements file.

So checking `locale.getpreferredencoding(False)` is better.
But note that `locale.getpreferredencoding(False)` may return "utf8",
"utf-8", "utf_8", "UTF-8"...

A good reason to provide a utility for this then -- I know i have no idea all the ways it could be spelled.
 
> That wouldn't be hard to do, but it might be worth having a small utility that does it in a _future__import:
>
> from __future__ import warn_if_not_utf8

It seems you are misusing __future__ import. __future__ import is for
compilers and parsers. It is not for runtime behavior.

well yes -- but to the "layperson" -- it's a way to say: "make this code act like it will in the future" --which is this case.
 
And I don't think we should add `warn_if_not_utf8()` for now.

I've been thinking about this -- on the one hand, if I, as a library or application author, am thinking about this issue, then I can (and should) add the ``encoding="utf-8"`` flag everywhere I open a text file in my code. So why not just do that, rather than adding an extra import or function call, or whatever?

But in fact, I know I've (and my dev team) have been lazy, and have a lot of places where I should be setting the encoding and am not. And sure, I know how to use grep -- I can find all those places. But it would actually be a lot easier and more reliable to have a way to set up the future behavior.

But maybe a topic for another thread.

>> Is it possible to enable UTF-8 mode in a configuration file like `pyvenv.cfg`?
>
> I can't see how that's any more powerful/flexible than an environment variable.

It is powerful/flexible for power users. But not for beginners.
Imagine users execute Jupyter from the start menu.

* Command-line `-Xutf8` or `set PYTHONUTF8=1` is not accessible.
* User environment variable is not accessible too, and it may affect
other Python installations.

which is actually what I like about environment variables -- it could apply to all Python installations on the system -- which would be a good thing!

Where would Python look for a "configuration file like `pyvenv.cfg`" ?

If we use user-wide (or system-wide) setting like `PYTHONUTF8` in user
environment variable, all Python environments use UTF-8 mode
consistently.
But it will break legacy applications running on old Python environment.

not ones old enough not to look for PYTHONUTF8 -- it would only change if the Python were upgraded.

and at least some legacy applications are using py2exe and the like, and those would still be safe.

>  If we have per-environment option, it's easy to recommend users to
enable UTF-8 mode.

Back to my idea above -- any way to have that be a pip (and conda) installable package? So it could be in a requirements file?

Do you mean programs only runs on UTF-8 mode warns if UTF-8 mode is
not enabled? e.g.

```
if sys.platform == "win32" and not sys.flags.utf8_mode:
    sys.exit("This programs runs only on UTF-8 mode. Please enable UTF-8 mode.")
```

Then, I don't like it... Windows only API to enable UTF-8 mode in
runtime seems better.

```
if sys.platform == "win32":
    sys._win32_enable_utf8mode()
```
I agree -- if that's possible, then it's a better option.

Though I would make it simply: 

``sys._enable_utf8mode()``

and have it be a no-op outside of Windows.

-CHB

--
Christopher Barker, PhD (Chris)

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython