PEP 597 -- Soft deprecation of omitting encoding
The PEP is here: https://www.python.org/dev/peps/pep-0597/
I created a thread on discuss.python.org
https://discuss.python.org/t/pep-597-raise-a-warning-when-encoding-is-omitte...
but I want more feedback.
Please reply on this mail or discuss.python.org.
## Summary
This PEP doesn't propose to change the default text encoding yet,
but introduce warning for code using default text encoding only in
dev mode.
This opt-in warning is useful for:
* Estimate the impact of changing the default text encoding.
* Find bugs they use default text encoding where it should be UTF-8.
## PEP
Abstract https://www.python.org/dev/peps/pep-0597/#id6
This PEP proposes:
- TextIOWrapper raises a PendingDeprecationWarning when the encoding option
is not specified, and dev mode is enabled.
- Add encoding="locale" option to TextIOWrapper. It behaves like
encoding=None but don't raise a warning.
Motivation https://www.python.org/dev/peps/pep-0597/#id7
Omitting encoding is common mistake
https://www.python.org/dev/peps/pep-0597/#id8
Developers using macOS or Linux may forget that the default encoding is not
always UTF-8.
For example, long_description = open("README.md").read() in setup.py is a
common mistake. Many Windows users can not install the package if there is
at least one non-ASCII character (e.g. emoji) in the README.md file.
For example, 489 packages of the 4000 most downloaded packages from PyPI
used non-ASCII characters in README. And 82 packages of them can not be
installed from source package when locale encoding is ASCII. [1
https://www.python.org/dev/peps/pep-0597/#id2] They used default encoding
to read README or TOML file.
Another example is logging.basicConfig(filename="log.txt"). Some users
expect UTF-8 is used by default, but locale encoding is used actually. [2
https://www.python.org/dev/peps/pep-0597/#id3]
Even Python experts assume that default encoding is UTF-8. It creates bugs
that happen only on Windows. See [3
https://www.python.org/dev/peps/pep-0597/#id4] and [4
https://www.python.org/dev/peps/pep-0597/#id5].
Raising a warning when the encoding option is omitted will help to find
such mistakes.
Prepare to change the default encoding to UTF-8
https://www.python.org/dev/peps/pep-0597/#id9
We chose to use locale encoding for the default text encoding in Python
3.0. But UTF-8 has been adopted very widely since then.
We might change the default text encoding to UTF-8 in the future. But this
change will affect many applications and libraries. Many
DeprecationWarning will
be raised if we start raising the warning by default. It will be too noisy.
While this PEP doesn't cover the change, this PEP will help to reduce the
number of DeprecationWarning in the future.
Specification https://www.python.org/dev/peps/pep-0597/#id10
Raising a PendingDeprecationWarning
https://www.python.org/dev/peps/pep-0597/#id11
TextIOWrapper raises the PendingDeprecationWarning when the encoding option
is omitted, and dev mode is enabled.
encoding="locale" option https://www.python.org/dev/peps/pep-0597/#id12
When encoding="locale" is specified to the TextIOWrapper, it behaves same
to encoding=None. In detail, the encoding is chosen by:
1. os.device_encoding(buffer.fileno())
2. locale.getpreferredencoding(False)
This option can be used to suppress the PendingDeprecationWarning.
io.text_encoding https://www.python.org/dev/peps/pep-0597/#id13
TextIOWrapper is used indirectly in most cases. For example, open, and
pathlib.Path.read_text() use it. Warning to these functions doesn't make
sense. Callers of these functions should be warned instead.
io.text_encoding(encoding, stacklevel=1) is a helper function for it. Pure
Python implementation will be like this:
def text_encoding(encoding, stacklevel=1):
"""
Helper function to choose the text encoding.
When encoding is not None, just return it.
Otherwise, return the default text encoding ("locale" for now),
and raise a PendingDeprecationWarning in dev mode.
This function can be used in APIs having encoding=None option.
But please consider encoding="utf-8" for new APIs.
"""
if encoding is None:
if sys.flags.dev_mode:
import warnings
warnings.warn(
"'encoding' option is not specified. The default encoding "
"will be changed to 'utf-8' in the future",
PendingDeprecationWarning, stacklevel + 2)
encoding = "locale"
return encoding
pathlib.Path.read_text() can use this function like this:
def read_text(self, encoding=None, errors=None):
"""
Open the file in text mode, read it, and close the file.
"""
encoding = io.text_encoding(encoding)
with self.open(mode='r', encoding=encoding, errors=errors) as f:
return f.read()
subprocess module doesn't warn
https://www.python.org/dev/peps/pep-0597/#id14
While the subprocess module uses TextIOWrapper, it doesn't raise
PendingDeprecationWarning. It uses the "locale" encoding by default.
Rationale https://www.python.org/dev/peps/pep-0597/#id15
"locale" is not a codec alias
https://www.python.org/dev/peps/pep-0597/#id16
We don't add the "locale" to the codec alias because locale can be changed
in runtime.
Additionally, TextIOWrapper checks os.device_encoding() when encoding=None.
This behavior can not be implemented in the codec.
Use a PendingDeprecationWarning
https://www.python.org/dev/peps/pep-0597/#id17
This PEP doesn't make decision about changing default text encoding. So we
use PendingDeprecationWarning instead of DeprecationWarning for now.
Raise warning only in dev mode
https://www.python.org/dev/peps/pep-0597/#id18
This PEP will produce a huge amount of PendingDeprecationWarning. It will
be too noisy for most Python developers.
We need to fix warnings in standard library, pip, and major dev tools like
pytest before raise this warning by default.
subprocess module doesn't warn
https://www.python.org/dev/peps/pep-0597/#id19
The default encoding for PIPE is relating to the encoding of the stdio. It
should be discussed later.
Reference Implementation https://www.python.org/dev/peps/pep-0597/#id20
https://github.com/python/cpython/pull/19481
References https://www.python.org/dev/peps/pep-0597/#id21
[1] "Packages can't be installed when encoding is not UTF-8" (
https://github.com/methane/pep597-pypi-ascii)
[2] "Logging - Inconsistent behaviour when handling unicode" (
https://bugs.python.org/issue37111)
[3] Packaging tutorial in packaging.python.org didn't specify encoding to
read a README.md (https://github.com/pypa/packaging.python.org/pull/682)
[4] json.tool had used locale encoding to read JSON files. (
https://bugs.python.org/issue33684)
Copyright https://www.python.org/dev/peps/pep-0597/#id22
This document has been placed in the public domain.
Source: https://github.com/python/peps/blob/master/pep-0597.rst
--
Inada Naoki
participants (1)
-
Inada Naoki