[Python-ideas] Re: init in module names

9 Dec 2020

      08.12.20 21:47, Gregory Szorc пише:
...
PyOxidizer's pure Rust implementation of a meta path importer
(https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_oxidized_finde...)
has been surprisingly effective at finding corner cases and behavior
quirks in Python's importing mechanisms.
It was recently brought to my attention via
https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in
module names is something that exists in Python code in the wild. (See
https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code
for some examples.)
In that GitHub issue and https://bugs.python.org/issue42564, I
discovered that what's happening is the stdlib PathFinder meta path
importer is "dumb" and doesn't treat "__init__" in module names
specially. If someone uses syntax like "import foo.__init__" or "from
.__init__ import foo", PathFinder operates on "__init__" like any other
string value and proceeds to probe the filesystem for the relevant {.py,
.pyc, .so, etc} files. The "__init__" files do exist in probed locations
and PathFinder summarily constructs a new module object, albeit with
"__init__" in its name. The end result is you have 2 module objects and
sys.modules entries referring to the same file, keyed to different names
(e.g. "foo" and "foo.__init__").
There is a strong argument to be made that "__init__" in module names
should be treated specially. It seems wrong to me that you are allowed
to address the same module/file through different names (let's pretend
filesystem path normalization doesn't exist) and that the filesystem
encoding of Python module files/names is addressable through the
importer names. This feels like a bug that inadvertently shipped.
However, code in the wild is clearly relying on "__init__" in module
names being allowed. And changing the behavior is backwards incompatible
and could break this code.
Anyway, I was encouraged by Brett Cannon to email this list to assess
the appetite for introducing a backwards incompatible change to this
behavior. So here's my strawman/hardline proposal:
1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any
module part component (`"__init__" in fullname.split(".")`).
2. Some future release (I'm unsure which) turns it into a hard error.
(A less aggressive proposal would be to normalize "__init__" in module
names to something more reasonable - maybe stripping trailing
".__init__" from module names. But I'll start by proposing the stricter
solution.)
What do others think we should do?
Thank you for good explanation of the problem.

Initially I though that this problem is not worth our attention. It just
does not happen in normal code. If a newbie writes like that and get a
bug because of it, he will learn from his mistake and will not write it
next time. This should be a task for linters to warn about such code.

But beginners and non-professionals do not use linters. And from what
confusion your message caused to commenters in this thread, I changed my
mind and inclined to agree with you. Yes, it may be worth to add a
runtime test to the import machinery.

There are similar precedences of warnings about obviously wrong code:

* `a is 0` currently works on CPython, and always worked, but this code
is clearly semantically incorrect. Now you will get a SyntaxWarning.

* `if a.__lt__(b):` may work most of times, but it can work incorrectly
when types are non-comparable and the result is NotImplemented. Now you
will get DeprecationWarning.

[Python-ideas] Re: __init__ in module names

Serhiy Storchaka

[Python-ideas] Re: init in module names