Hi Inada-san, I followed the discussions on your different PEP and I like overall your latest PEP :-) I have some minor remarks. On Mon, Feb 1, 2021 at 6:55 AM Inada Naoki <songofacandy@gmail.com> wrote:
The warning is disabled by default. New ``-X warn_encoding`` command-line option and ``PYTHONWARNENCODING`` environment variable are used to enable the warnings.
Maybe "warn implicit encoding" or "warn omit encoding" (not sure if it's make sense written like that in english ;-)) would be more explicit.
Options to enable the warning ------------------------------
``-X warn_encoding`` option and the ``PYTHONWARNENCODING`` environment variable are added. They are used to enable the ``EncodingWarning``.
``sys.flags.encoding_warning`` is also added. The flag represents ``EncodingWarning`` is enabled.
Nitpick: I would prefer using the same name for the -X option and the sys.flags attribute (ex: sys.flags.warn_encoding).
``encoding="locale"`` option ----------------------------
``io.TextIOWrapper`` accepts ``encoding="locale"`` option. It means same to current ``encoding=None``. But ``io.TextIOWrapper`` doesn't emit ``EncodingWarning`` when ``encoding="locale"`` is specified.
Can you please define if os.device_encoding(fd) is called if encoding="locale" is used? It seems so, so it's not obvious from the PEP. In Python 3.10, I added _locale._get_locale_encoding() function which is exactly what the encoding used by open() when no encoding is specified (encoding=None) and when os.device_encoding(fd) returns None. See _Py_GetLocaleEncoding() for the C implementation (Python/fileutils.c). Maybe we should add a public locale.get_locale_encoding() function? On Unix, this function uses nl_langinfo(CODESET) *without* setting LC_CTYPE locale to the user preferred locale. I understand that encoding=locale.get_locale_encoding() would be different from encoding="locale": encoding=locale.get_locale_encoding() doesn't call os.device_encoding(), right? Maybe the PEP should also explain (in a "How to teach this" section?) when encoding="locale" is better than a specific encoding, like encoding="utf-8" or encoding="cp1252". In my experience, it's mostly for the inter-operability which other applications which also use the current locale encoding. By the way, I recently rewrote the documentation about the encodings used by Python: * https://docs.python.org/dev/glossary.html#term-locale-encoding * https://docs.python.org/dev/glossary.html#term-locale-encoding * https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.filesystem_enc... * https://docs.python.org/dev/c-api/init_config.html#c.PyConfig.stdio_encoding * https://docs.python.org/dev/library/os.html#utf8-mode
Add ``io.LOCALE_ENCODING = "locale"`` constant too. This constant can be used to avoid confusing ``LookupError: unknown encoding: locale`` error when the code is run in old Python accidentally.
I'm not sure that it is useful. I like a simple "locale" literal string. If there is a constant is io, people may start to think that it's specific and will add "import io" just to get the string "locale". I don't think that we should care too much about the error message rased by old Python versions.
Opt-in warning ---------------
Although ``DeprecationWarning`` is suppressed by default, emitting ``DeprecationWarning`` always when ``encoding`` option is omitted would be too noisy.
The PEP is not very clear. Does "-X warn_encoding" only emits the warning, or does it also display it by default? Does it add a warning filter for EncodingWarning? The PEP has no "Backward compatibility" section. Is it possible to monkey-patch Python to implement this PEP (maybe only partially) on old Python versions? I'm asking to prepare existing projects for future EncodingWarning. The main question is if it's possible to use encoding="locale" on Python 3.6-3.9 (maybe using some ugly hacks). By the way, your PEP has no target Python version ;-) Do you want to get it in Python 3.10 or 3.11? Victor -- Night gathers, and now my watch begins. It shall not end until my death.