On Tue, Dec 08, 2020 at 11:47:22AM -0800, Gregory Szorc wrote:
It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild.
Can we be clear whether you are talking about "__init__" **in** module names (a substring, like "my__init__module.py") or "__init__" **as** a module name (not a substring, "__init__.py" exactly)? My guess is that you are only talking about the second case, can you confirm please?
In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially.
I would hope and expect that it doesn't. If somebody explicitly asks to do something, Python should do what they ask, and not something different. Analogy: if I explicitly call `someobject.__init__(*args)` then I would expect Python to call that method, and not to translate that into a call to `type(someobject).__new__(*args)` because "__init__ is special". The interpreter should do as its told and not try to guess what I meant.
If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").
Right. But given that the caller has *explicitly* asked for "foo.__init__" to be imported, presumably that is exactly the behaviour they want. Are there cases where people inadvertly import "foo.__init__" and are then surprised to get a different module from "foo" alone? Personally, I think this is a case for education. If you are explicitly touching *any* dunder name, it is up to you to know what you are doing.
There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names
Can you make that strong argument please? "It seems wrong to me" is a very weak argument.
(let's pretend filesystem path normalization doesn't exist)
Let's not pretend, because it does exist. There is also the "module importing itself" issue, and hard links, and I'm sure that there are other clever ways to get two module objects out of a single module file. Deep copying doesn't work, but modules are very simple objects and you can copy them by hand: import spam eggs = type(spam)("eggs", vars(spam).copy())
and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.
Not to me. The current behaviour is exactly what I would expect.
However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.
Right, so "it feels wrong" is not a sufficient reason to make that breaking change. I think that you would need to demonstrate that: (1) people are inadvertly importing "__init__", not realising the consequences; (2) leading to bugs in their code; (3) that this happens *more often* than people intentionally and knowingly importing "__init__"; (4) and that there is a work-around for those intentionally importing "__init__". -- Steve