08.12.20 21:47, Gregory Szorc пише:
PyOxidizer's pure Rust implementation of a meta path importer (https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_oxidized_finde...) has been surprisingly effective at finding corner cases and behavior quirks in Python's importing mechanisms.
It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild. (See https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code for some examples.)
In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially. If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").
There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names (let's pretend filesystem path normalization doesn't exist) and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.
However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.
Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:
1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.
(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)
What do others think we should do?
Thank you for good explanation of the problem. Initially I though that this problem is not worth our attention. It just does not happen in normal code. If a newbie writes like that and get a bug because of it, he will learn from his mistake and will not write it next time. This should be a task for linters to warn about such code. But beginners and non-professionals do not use linters. And from what confusion your message caused to commenters in this thread, I changed my mind and inclined to agree with you. Yes, it may be worth to add a runtime test to the import machinery. There are similar precedences of warnings about obviously wrong code: * `a is 0` currently works on CPython, and always worked, but this code is clearly semantically incorrect. Now you will get a SyntaxWarning. * `if a.__lt__(b):` may work most of times, but it can work incorrectly when types are non-comparable and the result is NotImplemented. Now you will get DeprecationWarning.