Mailman 3 __init__ in module names - Python-ideas

init in module names

Gregory Szorc

Dec. 8, 2020

8:47 p.m.

PyOxidizer's pure Rust implementation of a meta path importer ( https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_oxidized_finde...) has been surprisingly effective at finding corner cases and behavior quirks in Python's importing mechanisms. It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild. (See https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code for some examples.) In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially. If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__"). There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names (let's pretend filesystem path normalization doesn't exist) and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped. However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code. Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal: 1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error. (A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.) What do others think we should do? Gregory

Attachments:

attachment.htm (text/html — 3.3 KB)

Show replies by date

Filipe Laíns

December 2020

9:06 p.m.

On Tue, 2020-12-08 at 11:47 -0800, Gregory Szorc wrote:

...

PyOxidizer's pure Rust implementation of a meta path importer (https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_oxidized_finde... ) has been surprisingly effective at finding corner cases and behavior quirks in Python's importing mechanisms.

It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild. (See https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code for some examples.)

In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially. If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").

There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names (let's pretend filesystem path normalization doesn't exist) and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.

However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.

Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:

1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.

(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)

What do others think we should do?

Gregory _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UEHUJO... Code of Conduct: http://python.org/psf/codeofconduct/

I support this change. It would also be fairly trivial to make the hard error be descriptive and user- friendly, like the Python 3 does for Python 2 style print statements, so it's not like code would just stop working in a obscure way. Cheers, Filipe Laíns

Steven D'Aprano

11:39 p.m.

On Tue, Dec 08, 2020 at 08:06:09PM +0000, Filipe Laíns wrote:

...

I support this change.

Can you explain why you support this breaking change? I am especially interested in cases where people accidentally, or inadvertently, imported "package.__init__" and then used it without realising that it is distinct from `import package` alone. -- Steve

M.-A. Lemburg

9:14 p.m.

On 08.12.2020 20:47, Gregory Szorc wrote:

...

Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:

1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.

-1 on this proposal. We don't want to needlessly break code just because they use a feature of the existing implementation, which has been around for decades. Moreover, if you use namespace packages, a module __init__.py does not have to exist in the directory, so importing pkg.__init__ is a way to test for such a case.

...

(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)

-0 on this, since it may break code. If done, Python should issue a warning to flag the issue. Third solution: leave things as they are and document it. +1 on this one, since it's been like this for ages (going way back to the Python 1.x days). Cheers, -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 08 2020)

...

...
...
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/

::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

Serhiy Storchaka

2:18 p.m.

08.12.20 22:14, M.-A. Lemburg пише:

...

This feature has some bad side effects. Also, it does not look like this feature was added intentionally, otherwise we would handle these side effects.

...

Why do you need to test such a case? And are there other ways, without such side effects? I never seen a code try: import mypackage.__init__ except ImportError: # do something else: # do something else (But on other hand, I did not see importing from __init__ as well before OP opened the issue.)

...

Actually, making package.__init__ an alias of package would mitigate the issue. But I am not sure we need such feature.

...

It was my first reaction. Just say "Don't do this". But many people does not read documentation and does not use linters, so it makes sense to add a warning which can force them to read explanation in the documentation (or at least ask a question on forum).

M.-A. Lemburg

2:39 p.m.

On 10.12.2020 14:18, Serhiy Storchaka wrote:

...

08.12.20 22:14, M.-A. Lemburg пише:

...
On 08.12.2020 20:47, Gregory Szorc wrote:

...
Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:

1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.

-1 on this proposal. We don't want to needlessly break code just because they use a feature of the existing implementation, which has been around for decades.

This feature has some bad side effects. Also, it does not look like this feature was added intentionally, otherwise we would handle these side effects.

I know that it's not intentional, but people are obviously using it and this code would break.

...

...
Moreover, if you use namespace packages, a module __init__.py does not have to exist in the directory, so importing pkg.__init__ is a way to test for such a case.

Why do you need to test such a case? And are there other ways, without such side effects?

I never seen a code

try: import mypackage.__init__ except ImportError: # do something else: # do something else

(But on other hand, I did not see importing from __init__ as well before OP opened the issue.)

:-) Just wanted to point out that the situation is a bit different now that we have namespace packages, compared to the days when packages were added to Python. Note that the issue of importing the same module more than once also shows up when you have your PYTHONPATH incorrectly set up, e.g. pointing inside a package as well as to the top-level.

...

...
...
(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)

-0 on this, since it may break code. If done, Python should issue a warning to flag the issue.

Actually, making package.__init__ an alias of package would mitigate the issue. But I am not sure we need such feature.

I think we could do something to mitigate the negative effects (running the __init__.py code twice) by having the second module object use the same dict as the package module, but there's an issue: the "__init__.py" will actually show up in the package name and code could be using this. Is this worth the trouble ? I doubt it. It's probably better to add a warning to make users aware of the possible issue and have them fix it.

...

...
Third solution: leave things as they are and document it.

+1 on this one, since it's been like this for ages (going way back to the Python 1.x days).

It was my first reaction. Just say "Don't do this". But many people does not read documentation and does not use linters, so it makes sense to add a warning which can force them to read explanation in the documentation (or at least ask a question on forum).

Agreed. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 10 2020)

...

...
...
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/

Steven D'Aprano

11:30 p.m.

On Tue, Dec 08, 2020 at 11:47:22AM -0800, Gregory Szorc wrote:

...

It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild.

Can we be clear whether you are talking about "__init__" **in** module names (a substring, like "my__init__module.py") or "__init__" **as** a module name (not a substring, "__init__.py" exactly)? My guess is that you are only talking about the second case, can you confirm please?

...

In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially.

I would hope and expect that it doesn't. If somebody explicitly asks to do something, Python should do what they ask, and not something different. Analogy: if I explicitly call `someobject.__init__(*args)` then I would expect Python to call that method, and not to translate that into a call to `type(someobject).__new__(*args)` because "__init__ is special". The interpreter should do as its told and not try to guess what I meant.

...

If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").

Right. But given that the caller has *explicitly* asked for "foo.__init__" to be imported, presumably that is exactly the behaviour they want. Are there cases where people inadvertly import "foo.__init__" and are then surprised to get a different module from "foo" alone? Personally, I think this is a case for education. If you are explicitly touching *any* dunder name, it is up to you to know what you are doing.

...

There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names

Can you make that strong argument please? "It seems wrong to me" is a very weak argument.

...

(let's pretend filesystem path normalization doesn't exist)

Let's not pretend, because it does exist. There is also the "module importing itself" issue, and hard links, and I'm sure that there are other clever ways to get two module objects out of a single module file. Deep copying doesn't work, but modules are very simple objects and you can copy them by hand: import spam eggs = type(spam)("eggs", vars(spam).copy())

...

and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.

Not to me. The current behaviour is exactly what I would expect.

...

However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.

Right, so "it feels wrong" is not a sufficient reason to make that breaking change. I think that you would need to demonstrate that: (1) people are inadvertly importing "__init__", not realising the consequences; (2) leading to bugs in their code; (3) that this happens *more often* than people intentionally and knowingly importing "__init__"; (4) and that there is a work-around for those intentionally importing "__init__". -- Steve

Gregory Szorc

12:07 a.m.

On Tue, Dec 8, 2020 at 2:44 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

We're talking about "__init__" being the exact name of a module component. `"__init__" in fullname.split(".")`, as I wrote in my initial email. "__init__ as a substring - as weird as that may be - should be allowed. This is because it is only the exact "__init__" filename that is treated specially by the filename resolver.

...

Then this is an argument against silent normalization of the module name. I buy that argument.

...

...
If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").

Right. But given that the caller has *explicitly* asked for "foo.__init__" to be imported, presumably that is exactly the behaviour they want.

Are there cases where people inadvertly import "foo.__init__" and are then surprised to get a different module from "foo" alone?

Personally, I think this is a case for education. If you are explicitly touching *any* dunder name, it is up to you to know what you are doing.

...
There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names

Can you make that strong argument please? "It seems wrong to me" is a very weak argument.

...
(let's pretend filesystem path normalization doesn't exist)

Let's not pretend, because it does exist.

There is also the "module importing itself" issue, and hard links, and I'm sure that there are other clever ways to get two module objects out of a single module file. Deep copying doesn't work, but modules are very simple objects and you can copy them by hand:

import spam eggs = type(spam)("eggs", vars(spam).copy())

...
and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.

Not to me. The current behaviour is exactly what I would expect.

...
However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.

Right, so "it feels wrong" is not a sufficient reason to make that breaking change.

I think that you would need to demonstrate that:

(1) people are inadvertly importing "__init__", not realising the consequences;

...

I can't speak for the people practicing this pattern because I'm not one of them. However, I'm willing to bet a lot of them are either cargo culting the practice or thinking "oh, this __init__.py file exists, '__init__' must be the module name." The importer/code works and they run with it. I'm also willing to wager that people engaged in this practice (who apparently don't fully understand how the importer works otherwise they wouldn't be using "__init__" in module names) don't realize that this practice results in multiple module objects. I'm willing to wager that a subset of these people have seen weird bugs or undesired behavior due to the existence of multiple module objects (e.g. 2 instances of a supposed module singleton). I wish I could find stronger evidence here, but I don't have anything concrete, just a GitHub search showing code in the wild, likely authored by people who aren't Python experts. The strongest argument I can make is that it is highly unlikely that someone intended to use "__init__" in the module name to incur the creation of a variant of a package module and rather instead thought it was how to import the package module itself. Since the duplicate module object can lead to subtle bugs where symbols don't refer to the exact same PyObject, this is a footgun and the responsible thing to do is to eliminate that footgun. If people do need to create duplicate module objects backed by the same file, those power users have the various APIs in importlib to do this.

Steven D'Aprano

1:17 a.m.

On Tue, Dec 08, 2020 at 03:07:48PM -0800, Gregory Szorc wrote:

...

We're talking about "__init__" being the exact name of a module component. `"__init__" in fullname.split(".")`, as I wrote in my initial email.

I'm sorry, you've just added more confusion to me, rather than less :-( What's a module component? Do you mean a directory name? If I have a package: mypackage/subpackage/__init__/stuff/things/spam.py never mind that this is an unusual naming convention, but according to your test given above, that means that importing mypackage.subpackage.__init__.stuff.things.spam should be treated according to your proposal. I *thought* you meant only modules called literally "__init__.py" (or equivalent .pyc etc), and only if they are part of a package, but your test above suggests that this is not the case. For example, if I have a non-package module called "__init__.py" with no package structure, what happens? Right now I can import that easily, and because there is no package structure, it's just a module with an unusual name. So that's two cases that your "in fullname.split" test above seems to mishandle: - bare modules (not part of a package) called "__init__"; - (sub)packages called "__init__", corresponding to a directory called "__init__". There may be other cases. I think your proposal needs to be more specific about *exactly* what cases you are handling. I think it is *only* the case that you have a package and the __init__.py file inside the package is imported directly, but your test for that case is too broad. (Modulo file extensions of course.)

...

"__init__ as a substring - as weird as that may be - should be allowed. This is because it is only the exact "__init__" filename that is treated specially by the filename resolver.

Okay, that's what I thought you meant, but your insistence on this test: "__init__" in fullname.split(".") [Me]

...

...
I think that you would need to demonstrate that:

(1) people are inadvertly importing "__init__", not realising the consequences;

...
(2) leading to bugs in their code;

(3) that this happens *more often* than people intentionally and knowingly importing "__init__";

(4) and that there is a work-around for those intentionally importing "__init__".

[Gregory]

...

I can't speak for the people practicing this pattern because I'm not one of them. However, I'm willing to bet a lot of them are either cargo culting the practice or thinking "oh, this __init__.py file exists, '__init__' must be the module name." The importer/code works and they run with it.

So do you have any examples of *actual* bugs caused by this feature, or is this is a hypothetical problem?

...

I'm also willing to wager that people engaged in this practice (who apparently don't fully understand how the importer works otherwise they wouldn't be using "__init__" in module names) don't realize that this practice results in multiple module objects. I'm willing to wager that a subset of these people have seen weird bugs or undesired behavior due to the existence of multiple module objects (e.g. 2 instances of a supposed module singleton).

I wish I could find stronger evidence here, but I don't have anything concrete, just a GitHub search showing code in the wild, likely authored by people who aren't Python experts.

Or maybe they are experts who are doing exactly what they need to do, e.g. see MAL's comment about using the existence of an __init__ module to distinguish a namespace package from a regular package. As the person proposing a backwards-compatibility breaking change, the onus is on you (or somebody supporting your proposal) to demonstrate that breaking people's code, even with just a warning, is *less bad* than the status quo. "I feel, I expect, I would wager" etc is not evidence for this being an actual genuine problem that needs fixing. I would need to see more than just your gut feeling to support changing this behaviour. As MAL says, this behaviour apparently goes back to Python 1.5 days and the number of bug reports caused by it is approximately zero. I agree with you that importing __init__ directly *feels* weird, it's a code smell, but then I feel the same about people who call dunders directly, and even referring to __doc__ directly feels weird. But that doesn't mean it is broken. -- Steve

Steven D'Aprano

11:57 a.m.

On Wed, Dec 09, 2020 at 11:17:19AM +1100, Steven D'Aprano wrote:

...

Oops, I got distracted and didn't complete that thought. I think it is redundant -- I covered the issues with the fullname.split test in other parts of my email. Sorry for any confusion. -- Steve

Serhiy Storchaka

12:56 p.m.

08.12.20 21:47, Gregory Szorc пише:

...

PyOxidizer's pure Rust implementation of a meta path importer (https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_oxidized_finde...) has been surprisingly effective at finding corner cases and behavior quirks in Python's importing mechanisms.

It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild. (See https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code for some examples.)

In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially. If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").

There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names (let's pretend filesystem path normalization doesn't exist) and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.

However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.

Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:

1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.

(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)

What do others think we should do?

Thank you for good explanation of the problem. Initially I though that this problem is not worth our attention. It just does not happen in normal code. If a newbie writes like that and get a bug because of it, he will learn from his mistake and will not write it next time. This should be a task for linters to warn about such code. But beginners and non-professionals do not use linters. And from what confusion your message caused to commenters in this thread, I changed my mind and inclined to agree with you. Yes, it may be worth to add a runtime test to the import machinery. There are similar precedences of warnings about obviously wrong code: * `a is 0` currently works on CPython, and always worked, but this code is clearly semantically incorrect. Now you will get a SyntaxWarning. * `if a.__lt__(b):` may work most of times, but it can work incorrectly when types are non-comparable and the result is NotImplemented. Now you will get DeprecationWarning.

Steven D'Aprano

9:39 p.m.

On Wed, Dec 09, 2020 at 01:56:01PM +0200, Serhiy Storchaka wrote:

...

Thank you for good explanation of the problem.

I'm sorry Serhiy, I disagree that this has been a "good explanation of the problem". Gregory has not identified any actual bugs caused by this. The only problem he has identified is that doing this will lead to two separate module objects from the same file, but as MAL points out, people can do this intentionally. Gregory hasn't identified any cases where people are doing this accidentally and having bugs in their code because of that. He just assumes that they are. Gregory has still not been 100% clear that he is only talking about package __init__.py files. I am pretty sure that is what he means, but the only precise statement he has made is the code '__init__' in fullname.split('.') but that will effect non-package files: __init__.py # not a package, not a special name and also packages with unusual but legal names: package/subpackage/__init__/things/stuff.py

...

The only person publicly confused by this thread has been me, it is okay to refer to me by name, you won't embarass me :-) I am not confused by the alleged problem. The alleged problem is obvious: If you import a package __init__.py module directly, the file gets executed twice, you get two entries in sys.modules and two distinct module objects. I am confused by **Gregory's explanation** of the problem. So far, as far as I can tell from this thread, the only concrete information we have is that people use this feature and there is at least one use-case for it. That's not enough to justify breaking people's code! I am against changing this behaviour before anyone has identified *actual bugs* in code caused by it, and before anyone has addressed the use-case MAL gave for the current behaviour. -- Steve

Ethan Furman

9:55 p.m.

On 12/9/20 12:39 PM, Steven D'Aprano wrote:

...

+1 -- ~Ethan~

Serhiy Storchaka

2:46 p.m.

09.12.20 22:39, Steven D'Aprano пише:

...

Maybe it is just me, because I read the original issue. But Gregory's message looks to me well organized and answering questions that were asked before and possible new questions. Here is an example. File "foo/__init__.py" contains "class A: pass".

...

And this happens not only with classes. Modules foo and foo.__init__ has similar content, but their values are not the same. Some values can be identical, some are identical on some Python implementations and non-identical on others, some are equal but non-identical, some are not even equal.

...

Since __init__ is a special name and directory __init__ conflicts with file __init__.py, I do not think this is good idea. I am not even sure that it works. I do not think this is necessary, but just for the case it may be better to forbid intermediate __init__ components as well. But it depends on the implementation. What will look more natural.

Steven D'Aprano

3:40 p.m.

On Thu, Dec 10, 2020 at 03:46:37PM +0200, Serhiy Storchaka wrote:

...

Maybe it is just me, because I read the original issue. But Gregory's message looks to me well organized and answering questions that were asked before and possible new questions.

Here is an example. File "foo/__init__.py" contains "class A: pass".

...
...
...
from foo.__init__ import A import foo isinstance(A(), foo.A) False

Yes yes, I get that, and I got that from Gregory's first post. I understand the consequences. They are the same consequences as from:

...

...
...
import fractions del sys.modules['fractions'] import fractions as frac isinstance(fractions.Fraction(), frac.Fraction) False

Should we make a warning for this too? That's not a rhetorical question. Your argument applies here too: some people don't read the docs, and they might not realise that they have two distinct module objects from the same .py file, so we should force them to read the docs with a warning or an exception. There are other ways you can get this same effect. Should we have warnings for them all? import spam ImportWarning: spam.py is a hard link to eggs.py and we want to force you to read the docs to learn why this is a bad idea. I don't think we should scold users with warnings and errors, breaking their code, because it *might* be bad, or because the code looks weird. Weird code is not illegal. If somebody can prove that this is a common source of real program bugs, not just a potential source of bugs, then we should consider breaking backwards compatibility. But until there are proven real-world bugs coming from this feature, we should not touch it. Referring to a subpackage with an unusual name: package/__init__/__init__.py

...

Since __init__ is a special name and directory __init__ conflicts with file __init__.py, I do not think this is good idea.

There is no conflict between a directory called "__init__" and a file called "__init__.py". Both can exist in modern file systems. $ ls -d __init__* __init__ __init__.py I agree that it looks ugly and is weird, but we should not prohibit code just because it looks weird.

...

I am not even sure that it works.

Seems to work fine when I tried it: $ cat package/__init__.py print("importing toplevel package __init__.py") $ cat package/__init__/__init__.py print("importing subpackage __init__/__init__.py") And importing them: >>> import package.__init__ importing toplevel package __init__.py importing subpackage __init__/__init__.py If it didn't work, I would call that a bug. "__init__.py" module files are special; *directories* called "__init__" are not special, they're just an ordinary name. So if we change any behaviour, it should only apply to "package/__init__.py" files, not arbitrary path components "__init__". -- Steve

Serhiy Storchaka

10:10 p.m.

10.12.20 16:40, Steven D'Aprano пише:

...

It could be nice if there is a simple and efficient way to do this. This happens sometimes (mostly in tests), and more reported details would be helpful.

...

Do you know how to determine the name of the target of a hard link?

...

Agree, but in many cases the code is written in a weird way because the author did not know about the right way. We do not scold users, we inform them and help to fix potential error.

...

Now remove package/__init__/__init__.py, add package/__init__/module.py and try to import it.

Steven D'Aprano

3 a.m.

On Thu, Dec 10, 2020 at 11:10:05PM +0200, Serhiy Storchaka wrote:

...

I know how to do it in the Linux shell: get the inode number of the file, then search the file system for files with that inode number. Or you can use the `-samefile` option to `find`. In the case of Python modules, you wouldn't need to search the entire file system, but only those parts in the PYTHONPATH. Of course this would be expensive, but it would also be unnecessary. This is not the sort of thing the intepreter should treat as a bug or even as a warning. It is too rare, too costly to check, and if users are doing it, they probably want the current behaviour. Python is a Consenting Adults language, and I hope it remains as such. There are certain things which consenting adults should be allowed to do without it raising an error, and one of those is creating two independent module objects from a hard linked file.

...

Anything could be a potential error. We should require something a little stronger than just "potential error" before breaking backwards compatibility. Unless there is strong evidence of real errors, we normally leave enforcement of style issues to linters, coding standards and user education. For example, we are deprecating `bool(NotImplemented)` because we have strong evidence of lots of buggy code from operator dunders not treating NotImplemented correctly. That's all I am asking for here: if there is strong evidence of actual, real world bugs caused by this, then we should consider a change in behaviour. Without that evidence, then "backwards compatibility" and "consenting adults" should take precedence. Otherwise, we are breaking code that works well enough to satisfy the code's owner, even if it isn't perfect, even if it doesn't meet best practice. [...]

...

Now *that* looks like a bug to me. Here is my package setup: package / +-- __init__.py +-- directory / | +-- module.py | +-- __init__ / | +-- module.py +-- __init__ / +-- module.py Importing package.__init__.module fails, but importing package.directory.__init__.module succeeds even though neither init directory contains an __init__.py file. ``` [steve ~]$ python3 -c "import package.directory.module" importing toplevel package __init__.py importing package.directory.module [steve ~]$ python3 -c "import package.directory.__init__.module" importing toplevel package __init__.py importing package.directory.__init__.module [steve ~]$ python3 -c "import package.__init__.module" importing toplevel package __init__.py importing toplevel package __init__.py Traceback (most recent call last): File "<string>", line 1, in <module> ModuleNotFoundError: No module named 'package.__init__.module'; 'package.__init__' is not a package ``` Can somebody convince me that this is working correctly? Otherwise it looks like a bug to me. -- Steve

Gregory Szorc

6:44 p.m.

On Thu, Dec 10, 2020 at 5:47 AM Serhiy Storchaka <storchaka@gmail.com> wrote:

...

09.12.20 22:39, Steven D'Aprano пише:

...
On Wed, Dec 09, 2020 at 01:56:01PM +0200, Serhiy Storchaka wrote:

...
Thank you for good explanation of the problem.

I'm sorry Serhiy, I disagree that this has been a "good explanation of the problem".

Gregory has not identified any actual bugs caused by this. The only problem he has identified is that doing this will lead to two separate module objects from the same file, but as MAL points out, people can do this intentionally. Gregory hasn't identified any cases where people are doing this accidentally and having bugs in their code because of that. He just assumes that they are.

Maybe it is just me, because I read the original issue. But Gregory's message looks to me well organized and answering questions that were asked before and possible new questions.

Here is an example. File "foo/__init__.py" contains "class A: pass".

...
...
...
from foo.__init__ import A import foo isinstance(A(), foo.A) False

And this happens not only with classes. Modules foo and foo.__init__ has similar content, but their values are not the same. Some values can be identical, some are identical on some Python implementations and non-identical on others, some are equal but non-identical, some are not even equal.

...
Gregory has still not been 100% clear that he is only talking about package __init__.py files. I am pretty sure that is what he means, but the only precise statement he has made is the code

'__init__' in fullname.split('.')

but that will effect non-package files:

__init__.py # not a package, not a special name

and also packages with unusual but legal names:

package/subpackage/__init__/things/stuff.py

Since __init__ is a special name and directory __init__ conflicts with file __init__.py, I do not think this is good idea. I am not even sure that it works. I do not think this is necessary, but just for the case it may be better to forbid intermediate __init__ components as well. But it depends on the implementation. What will look more natural.

I'd also like to note that the various importers in the standard library are inconsistent in their handling of "__init__" as the trailing component (`fullname.split(".")`) of a module name. Specifically, the builtin, frozen, and zip importers will only match exact name matches. And since the canonical module name is "foo" instead of "foo.__init__", requests for "foo.__init__" will work with PathFinder but none of the other meta path finders in the standard library. I would argue that module names should be treated identically, regardless of the importer used. But this isn't the case and this is why I feel like the behavior of PathFinder is a bug that shipped. I'll also note that this behavior/bug affects the ability to distribute Python applications seamlessly. With the current behavior of allowing ".__init__" as the module name suffix, any Python code relying on this behavior will be difficult to package if using an application distribution tool that doesn't use PathFinder. This includes PyOxidizer, py2exe, PyInstaller, and various other tools which rely on the zip importer or custom importers. So one can make the argument that this one-off behavior of PathFinder undermines the ability to easily distribute Python applications and that in turn undermines the value of Python in the larger ecosystem. My opinion is the harm inflicted by dropping support for "__init__" in module names will be more than compensated by long-term benefits of enabling turnkey Python application distribution. But that's my personal take and I have no solid evidence to justify that claim. The evidence that PathFinder is inconsistent with other meta path finders in the standard library is irrefutable, however.

Guido van Rossum

7:57 p.m.

All I have to add is that I am appalled that people actually write `from foo import __init__`, and I am sorry that we left this hole open when implementing packages. I don't know what's the best way forward now that the cat is out of the bag, but deprecation seems a reasonable thing to do. As for how people can check whether a package is a namespace package, there are many other ways to check for that without attempting to import `__init__` from it. On Thu, Dec 10, 2020 at 9:46 AM Gregory Szorc <gregory.szorc@gmail.com> wrote:

...

On Thu, Dec 10, 2020 at 5:47 AM Serhiy Storchaka <storchaka@gmail.com> wrote:

...
09.12.20 22:39, Steven D'Aprano пише:

...
On Wed, Dec 09, 2020 at 01:56:01PM +0200, Serhiy Storchaka wrote:

...
Thank you for good explanation of the problem.

I'm sorry Serhiy, I disagree that this has been a "good explanation of the problem".

Gregory has not identified any actual bugs caused by this. The only problem he has identified is that doing this will lead to two separate module objects from the same file, but as MAL points out, people can do this intentionally. Gregory hasn't identified any cases where people are doing this accidentally and having bugs in their code because of that. He just assumes that they are.

Maybe it is just me, because I read the original issue. But Gregory's message looks to me well organized and answering questions that were asked before and possible new questions.

Here is an example. File "foo/__init__.py" contains "class A: pass".

...
...
...
from foo.__init__ import A import foo isinstance(A(), foo.A) False

And this happens not only with classes. Modules foo and foo.__init__ has similar content, but their values are not the same. Some values can be identical, some are identical on some Python implementations and non-identical on others, some are equal but non-identical, some are not even equal.

...
Gregory has still not been 100% clear that he is only talking about package __init__.py files. I am pretty sure that is what he means, but the only precise statement he has made is the code

'__init__' in fullname.split('.')

but that will effect non-package files:

__init__.py # not a package, not a special name

and also packages with unusual but legal names:

package/subpackage/__init__/things/stuff.py

Since __init__ is a special name and directory __init__ conflicts with file __init__.py, I do not think this is good idea. I am not even sure that it works. I do not think this is necessary, but just for the case it may be better to forbid intermediate __init__ components as well. But it depends on the implementation. What will look more natural.

I'd also like to note that the various importers in the standard library are inconsistent in their handling of "__init__" as the trailing component (`fullname.split(".")`) of a module name. Specifically, the builtin, frozen, and zip importers will only match exact name matches. And since the canonical module name is "foo" instead of "foo.__init__", requests for "foo.__init__" will work with PathFinder but none of the other meta path finders in the standard library.

I would argue that module names should be treated identically, regardless of the importer used. But this isn't the case and this is why I feel like the behavior of PathFinder is a bug that shipped.

I'll also note that this behavior/bug affects the ability to distribute Python applications seamlessly. With the current behavior of allowing ".__init__" as the module name suffix, any Python code relying on this behavior will be difficult to package if using an application distribution tool that doesn't use PathFinder. This includes PyOxidizer, py2exe, PyInstaller, and various other tools which rely on the zip importer or custom importers.

So one can make the argument that this one-off behavior of PathFinder undermines the ability to easily distribute Python applications and that in turn undermines the value of Python in the larger ecosystem. My opinion is the harm inflicted by dropping support for "__init__" in module names will be more than compensated by long-term benefits of enabling turnkey Python application distribution. But that's my personal take and I have no solid evidence to justify that claim. The evidence that PathFinder is inconsistent with other meta path finders in the standard library is irrefutable, however. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CIQP3Y... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Steven D'Aprano

10:18 p.m.

On Thu, Dec 10, 2020 at 10:57:32AM -0800, Guido van Rossum wrote:

...

All I have to add is that I am appalled that people actually write `from foo import __init__`

I too would be appalled if that was what people are doing, but it isn't. Looking at the code samples in the wild: https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code I see no examples of either of the (anti-)patterns given by Gregory in his opening post: # No examples of either of these. from package import __init__ import package.__init__ I have read through the first ten, and last five, pages of the search results and what people are doing is typically: from __init__ import name or sometimes with a wildcard import. There are also a few cases using a dot: from .__init__ import name and a few cases of attempted bilingual 2 and 3 code: try: from .__init__ import name #Python 3 except: from __init__ import name So what seems to be happening is that people have a package sub-module, say "package/run.py", and in Python 2 code they wanted to import from the main package from within run.py. I don't know if that is less appalling than what Gregory has told us, but it is different, and to my mind at least it makes it more understandable and not as weird looking. This is nothing like Gregory's characterisation of cargo cult (his term) programmers writing `import package.__init__` to import package. [...]

...

Are these many other ways a secret? *wink* Because if somebody with the experience and knowledge of MAL doesn't know them, let alone people like me, maybe you should give us a hint what they are. -- Steve

Guido van Rossum

10:48 p.m.

On Thu, Dec 10, 2020 at 1:22 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Thu, Dec 10, 2020 at 10:57:32AM -0800, Guido van Rossum wrote:

...
All I have to add is that I am appalled that people actually write `from foo import __init__`

I too would be appalled if that was what people are doing, but it isn't.

Looking at the code samples in the wild:

https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code

I see no examples of either of the (anti-)patterns given by Gregory in his opening post:

# No examples of either of these. from package import __init__ import package.__init__

I have read through the first ten, and last five, pages of the search results and what people are doing is typically:

from __init__ import name

or sometimes with a wildcard import. There are also a few cases using a dot:

from .__init__ import name

and a few cases of attempted bilingual 2 and 3 code:

try: from .__init__ import name #Python 3 except: from __init__ import name

So what seems to be happening is that people have a package sub-module, say "package/run.py", and in Python 2 code they wanted to import from the main package from within run.py.

Interesting, since Python 2 also supports relative imports. That last example makes the intention clear(er) -- as you write they want it to work both when "run" is imported as a toplevel module and when it is located inside a package, i.e. "package.run". If all the other code is in __init__.py that trick will indeed work, but it points to a general lack of understanding of how packages and import path resolution works. :-( This would be unrelated to Python 3 (unless their app somehow also evolved so that their Python 2 version uses a flat namespace while their Python 3 version uses packages).

...

I don't know if that is less appalling than what Gregory has told us, but it is different, and to my mind at least it makes it more understandable and not as weird looking.

Well, if I really meant to do something like that I'd write try: from . import name except ImportError: from __init__ import name but I would never write from .__init__ import name

...

This is nothing like Gregory's characterisation of cargo cult (his term) programmers writing `import package.__init__` to import package.

Well, depending on where you are in the stack, there may be no way to distinguish that from other forms involving a dot.

...

...
As for how people can check whether a package is a namespace package,

[...] there

...
are many other ways to check for that without attempting to import `__init__` from it.

Are these many other ways a secret? *wink* Because if somebody with the experience and knowledge of MAL doesn't know them, let alone people like me, maybe you should give us a hint what they are.

In the below example, foo is a namespace package and bar a classic package. Behold the differences: ``` Python 3.8.6 (tags/v3.8.6:db45529, Sep 23 2020, 15:52:53) [MSC v.1927 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

...

...
...
import foo, bar foo <module 'foo' (namespace)> bar <module 'bar' from 'C:\\Users\\gvanrossum\\cpython\\bar\\__init__.py'> dir(foo) ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__'] dir(bar) ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__'] foo.__file__ bar.__file__ 'C:\\Users\\gvanrossum\\cpython\\bar\\__init__.py' foo.__path__ _NamespacePath(['C:\\Users\\gvanrossum\\cpython\\foo']) bar.__path__ ['C:\\Users\\gvanrossum\\cpython\\bar']

(Note that `foo.__file__` prints nothing because the value is None.)

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
&lt;http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Paul Moore

8:17 p.m.

On Thu, 10 Dec 2020 at 17:48, Gregory Szorc <gregory.szorc@gmail.com> wrote:

...

So one can make the argument that this one-off behavior of PathFinder undermines the ability to easily distribute Python applications and that in turn undermines the value of Python in the larger ecosystem. My opinion is the harm inflicted by dropping support for "__init__" in module names will be more than compensated by long-term benefits of enabling turnkey Python application distribution. But that's my personal take and I have no solid evidence to justify that claim. The evidence that PathFinder is inconsistent with other meta path finders in the standard library is irrefutable, however.

+1. Improving the options for distributing Python applications (and in particular, having ways of distributing applications where the fact that they are written in Python is an "implementation detail", and end users have no need to know about it) is something that is sorely needed. Sure, we should follow our deprecation processes, but we don't need to be paralyzed by them. Paul

Filipe Laíns

December 2020

9:06 p.m.

On Tue, 2020-12-08 at 11:47 -0800, Gregory Szorc wrote:

...

PyOxidizer's pure Rust implementation of a meta path importer (https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_oxidized_finde... ) has been surprisingly effective at finding corner cases and behavior quirks in Python's importing mechanisms.

It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild. (See https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code for some examples.)

In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially. If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").

There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names (let's pretend filesystem path normalization doesn't exist) and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.

However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.

Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:

1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.

(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)

What do others think we should do?

Gregory _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UEHUJO... Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

11:39 p.m.

On Tue, Dec 08, 2020 at 08:06:09PM +0000, Filipe Laíns wrote:

...

I support this change.

M.-A. Lemburg

9:14 p.m.

On 08.12.2020 20:47, Gregory Szorc wrote:

...

Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:

1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.

...

(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)

...

...
...
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/

Serhiy Storchaka

2:18 p.m.

08.12.20 22:14, M.-A. Lemburg пише:

...

This feature has some bad side effects. Also, it does not look like this feature was added intentionally, otherwise we would handle these side effects.

...

Actually, making package.__init__ an alias of package would mitigate the issue. But I am not sure we need such feature.

...

M.-A. Lemburg

2:39 p.m.

On 10.12.2020 14:18, Serhiy Storchaka wrote:

...

08.12.20 22:14, M.-A. Lemburg пише:

...
On 08.12.2020 20:47, Gregory Szorc wrote:

...
Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:

1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.

-1 on this proposal. We don't want to needlessly break code just because they use a feature of the existing implementation, which has been around for decades.

This feature has some bad side effects. Also, it does not look like this feature was added intentionally, otherwise we would handle these side effects.

I know that it's not intentional, but people are obviously using it and this code would break.

...

...
Moreover, if you use namespace packages, a module __init__.py does not have to exist in the directory, so importing pkg.__init__ is a way to test for such a case.

Why do you need to test such a case? And are there other ways, without such side effects?

I never seen a code

try: import mypackage.__init__ except ImportError: # do something else: # do something else

(But on other hand, I did not see importing from __init__ as well before OP opened the issue.)

...

...
...
(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)

-0 on this, since it may break code. If done, Python should issue a warning to flag the issue.

Actually, making package.__init__ an alias of package would mitigate the issue. But I am not sure we need such feature.

...

...
Third solution: leave things as they are and document it.

+1 on this one, since it's been like this for ages (going way back to the Python 1.x days).

It was my first reaction. Just say "Don't do this". But many people does not read documentation and does not use linters, so it makes sense to add a warning which can force them to read explanation in the documentation (or at least ask a question on forum).

Agreed. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Dec 10 2020)

...

...
...
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/

Steven D'Aprano

11:30 p.m.

On Tue, Dec 08, 2020 at 11:47:22AM -0800, Gregory Szorc wrote:

...

It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild.

...

In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially.

...

If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").

...

There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names

Can you make that strong argument please? "It seems wrong to me" is a very weak argument.

...

(let's pretend filesystem path normalization doesn't exist)

...

and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.

Not to me. The current behaviour is exactly what I would expect.

...

However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.

Gregory Szorc

December 2020

12:07 a.m.

On Tue, Dec 8, 2020 at 2:44 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

Then this is an argument against silent normalization of the module name. I buy that argument.

...

...
If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").

Right. But given that the caller has *explicitly* asked for "foo.__init__" to be imported, presumably that is exactly the behaviour they want.

Are there cases where people inadvertly import "foo.__init__" and are then surprised to get a different module from "foo" alone?

Personally, I think this is a case for education. If you are explicitly touching *any* dunder name, it is up to you to know what you are doing.

...
There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names

Can you make that strong argument please? "It seems wrong to me" is a very weak argument.

...
(let's pretend filesystem path normalization doesn't exist)

Let's not pretend, because it does exist.

There is also the "module importing itself" issue, and hard links, and I'm sure that there are other clever ways to get two module objects out of a single module file. Deep copying doesn't work, but modules are very simple objects and you can copy them by hand:

import spam eggs = type(spam)("eggs", vars(spam).copy())

...
and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.

Not to me. The current behaviour is exactly what I would expect.

...
However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.

Right, so "it feels wrong" is not a sufficient reason to make that breaking change.

I think that you would need to demonstrate that:

(1) people are inadvertly importing "__init__", not realising the consequences;

...

Steven D'Aprano

1:17 a.m.

On Tue, Dec 08, 2020 at 03:07:48PM -0800, Gregory Szorc wrote:

...

We're talking about "__init__" being the exact name of a module component. `"__init__" in fullname.split(".")`, as I wrote in my initial email.

...

"__init__ as a substring - as weird as that may be - should be allowed. This is because it is only the exact "__init__" filename that is treated specially by the filename resolver.

Okay, that's what I thought you meant, but your insistence on this test: "__init__" in fullname.split(".") [Me]

...

...
I think that you would need to demonstrate that:

(1) people are inadvertly importing "__init__", not realising the consequences;

...
(2) leading to bugs in their code;

(3) that this happens *more often* than people intentionally and knowingly importing "__init__";

(4) and that there is a work-around for those intentionally importing "__init__".

[Gregory]

...

I can't speak for the people practicing this pattern because I'm not one of them. However, I'm willing to bet a lot of them are either cargo culting the practice or thinking "oh, this __init__.py file exists, '__init__' must be the module name." The importer/code works and they run with it.

So do you have any examples of *actual* bugs caused by this feature, or is this is a hypothetical problem?

...

I'm also willing to wager that people engaged in this practice (who apparently don't fully understand how the importer works otherwise they wouldn't be using "__init__" in module names) don't realize that this practice results in multiple module objects. I'm willing to wager that a subset of these people have seen weird bugs or undesired behavior due to the existence of multiple module objects (e.g. 2 instances of a supposed module singleton).

I wish I could find stronger evidence here, but I don't have anything concrete, just a GitHub search showing code in the wild, likely authored by people who aren't Python experts.

Steven D'Aprano

11:57 a.m.

On Wed, Dec 09, 2020 at 11:17:19AM +1100, Steven D'Aprano wrote:

...

Oops, I got distracted and didn't complete that thought. I think it is redundant -- I covered the issues with the fullname.split test in other parts of my email. Sorry for any confusion. -- Steve

Serhiy Storchaka

12:56 p.m.

08.12.20 21:47, Gregory Szorc пише:

...

PyOxidizer's pure Rust implementation of a meta path importer (https://pyoxidizer.readthedocs.io/en/stable/oxidized_importer_oxidized_finde...) has been surprisingly effective at finding corner cases and behavior quirks in Python's importing mechanisms.

It was recently brought to my attention via https://github.com/indygreg/PyOxidizer/issues/317 that "__init__" in module names is something that exists in Python code in the wild. (See https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code for some examples.)

In that GitHub issue and https://bugs.python.org/issue42564, I discovered that what's happening is the stdlib PathFinder meta path importer is "dumb" and doesn't treat "__init__" in module names specially. If someone uses syntax like "import foo.__init__" or "from .__init__ import foo", PathFinder operates on "__init__" like any other string value and proceeds to probe the filesystem for the relevant {.py, .pyc, .so, etc} files. The "__init__" files do exist in probed locations and PathFinder summarily constructs a new module object, albeit with "__init__" in its name. The end result is you have 2 module objects and sys.modules entries referring to the same file, keyed to different names (e.g. "foo" and "foo.__init__").

There is a strong argument to be made that "__init__" in module names should be treated specially. It seems wrong to me that you are allowed to address the same module/file through different names (let's pretend filesystem path normalization doesn't exist) and that the filesystem encoding of Python module files/names is addressable through the importer names. This feels like a bug that inadvertently shipped.

However, code in the wild is clearly relying on "__init__" in module names being allowed. And changing the behavior is backwards incompatible and could break this code.

Anyway, I was encouraged by Brett Cannon to email this list to assess the appetite for introducing a backwards incompatible change to this behavior. So here's my strawman/hardline proposal:

1. 3.10 introduces a DeprecationWarning for "__init__" appearing as any module part component (`"__init__" in fullname.split(".")`). 2. Some future release (I'm unsure which) turns it into a hard error.

(A less aggressive proposal would be to normalize "__init__" in module names to something more reasonable - maybe stripping trailing ".__init__" from module names. But I'll start by proposing the stricter solution.)

What do others think we should do?

Steven D'Aprano

9:39 p.m.

On Wed, Dec 09, 2020 at 01:56:01PM +0200, Serhiy Storchaka wrote:

...

Thank you for good explanation of the problem.

...

Ethan Furman

9:55 p.m.

On 12/9/20 12:39 PM, Steven D'Aprano wrote:

...

+1 -- ~Ethan~

Serhiy Storchaka

December 2020

2:46 p.m.

09.12.20 22:39, Steven D'Aprano пише:

...

Steven D'Aprano

3:40 p.m.

On Thu, Dec 10, 2020 at 03:46:37PM +0200, Serhiy Storchaka wrote:

...

Maybe it is just me, because I read the original issue. But Gregory's message looks to me well organized and answering questions that were asked before and possible new questions.

Here is an example. File "foo/__init__.py" contains "class A: pass".

...
...
...
from foo.__init__ import A import foo isinstance(A(), foo.A) False

Yes yes, I get that, and I got that from Gregory's first post. I understand the consequences. They are the same consequences as from:

...

...
...
import fractions del sys.modules['fractions'] import fractions as frac isinstance(fractions.Fraction(), frac.Fraction) False

...

Since __init__ is a special name and directory __init__ conflicts with file __init__.py, I do not think this is good idea.

...

I am not even sure that it works.

Serhiy Storchaka

10:10 p.m.

10.12.20 16:40, Steven D'Aprano пише:

...

It could be nice if there is a simple and efficient way to do this. This happens sometimes (mostly in tests), and more reported details would be helpful.

...

Do you know how to determine the name of the target of a hard link?

...

Agree, but in many cases the code is written in a weird way because the author did not know about the right way. We do not scold users, we inform them and help to fix potential error.

...

Now remove package/__init__/__init__.py, add package/__init__/module.py and try to import it.

Steven D'Aprano

3 a.m.

On Thu, Dec 10, 2020 at 11:10:05PM +0200, Serhiy Storchaka wrote:

...

Gregory Szorc

6:44 p.m.

On Thu, Dec 10, 2020 at 5:47 AM Serhiy Storchaka <storchaka@gmail.com> wrote:

...

09.12.20 22:39, Steven D'Aprano пише:

...
On Wed, Dec 09, 2020 at 01:56:01PM +0200, Serhiy Storchaka wrote:

...
Thank you for good explanation of the problem.

I'm sorry Serhiy, I disagree that this has been a "good explanation of the problem".

Gregory has not identified any actual bugs caused by this. The only problem he has identified is that doing this will lead to two separate module objects from the same file, but as MAL points out, people can do this intentionally. Gregory hasn't identified any cases where people are doing this accidentally and having bugs in their code because of that. He just assumes that they are.

Maybe it is just me, because I read the original issue. But Gregory's message looks to me well organized and answering questions that were asked before and possible new questions.

Here is an example. File "foo/__init__.py" contains "class A: pass".

...
...
...
from foo.__init__ import A import foo isinstance(A(), foo.A) False

And this happens not only with classes. Modules foo and foo.__init__ has similar content, but their values are not the same. Some values can be identical, some are identical on some Python implementations and non-identical on others, some are equal but non-identical, some are not even equal.

...
Gregory has still not been 100% clear that he is only talking about package __init__.py files. I am pretty sure that is what he means, but the only precise statement he has made is the code

'__init__' in fullname.split('.')

but that will effect non-package files:

__init__.py # not a package, not a special name

and also packages with unusual but legal names:

package/subpackage/__init__/things/stuff.py

Since __init__ is a special name and directory __init__ conflicts with file __init__.py, I do not think this is good idea. I am not even sure that it works. I do not think this is necessary, but just for the case it may be better to forbid intermediate __init__ components as well. But it depends on the implementation. What will look more natural.

Guido van Rossum

7:57 p.m.

...

On Thu, Dec 10, 2020 at 5:47 AM Serhiy Storchaka <storchaka@gmail.com> wrote:

...
09.12.20 22:39, Steven D'Aprano пише:

...
On Wed, Dec 09, 2020 at 01:56:01PM +0200, Serhiy Storchaka wrote:

...
Thank you for good explanation of the problem.

I'm sorry Serhiy, I disagree that this has been a "good explanation of the problem".

Gregory has not identified any actual bugs caused by this. The only problem he has identified is that doing this will lead to two separate module objects from the same file, but as MAL points out, people can do this intentionally. Gregory hasn't identified any cases where people are doing this accidentally and having bugs in their code because of that. He just assumes that they are.

Maybe it is just me, because I read the original issue. But Gregory's message looks to me well organized and answering questions that were asked before and possible new questions.

Here is an example. File "foo/__init__.py" contains "class A: pass".

...
...
...
from foo.__init__ import A import foo isinstance(A(), foo.A) False

And this happens not only with classes. Modules foo and foo.__init__ has similar content, but their values are not the same. Some values can be identical, some are identical on some Python implementations and non-identical on others, some are equal but non-identical, some are not even equal.

...
Gregory has still not been 100% clear that he is only talking about package __init__.py files. I am pretty sure that is what he means, but the only precise statement he has made is the code

'__init__' in fullname.split('.')

but that will effect non-package files:

__init__.py # not a package, not a special name

and also packages with unusual but legal names:

package/subpackage/__init__/things/stuff.py

Since __init__ is a special name and directory __init__ conflicts with file __init__.py, I do not think this is good idea. I am not even sure that it works. I do not think this is necessary, but just for the case it may be better to forbid intermediate __init__ components as well. But it depends on the implementation. What will look more natural.

I'd also like to note that the various importers in the standard library are inconsistent in their handling of "__init__" as the trailing component (`fullname.split(".")`) of a module name. Specifically, the builtin, frozen, and zip importers will only match exact name matches. And since the canonical module name is "foo" instead of "foo.__init__", requests for "foo.__init__" will work with PathFinder but none of the other meta path finders in the standard library.

I would argue that module names should be treated identically, regardless of the importer used. But this isn't the case and this is why I feel like the behavior of PathFinder is a bug that shipped.

I'll also note that this behavior/bug affects the ability to distribute Python applications seamlessly. With the current behavior of allowing ".__init__" as the module name suffix, any Python code relying on this behavior will be difficult to package if using an application distribution tool that doesn't use PathFinder. This includes PyOxidizer, py2exe, PyInstaller, and various other tools which rely on the zip importer or custom importers.

So one can make the argument that this one-off behavior of PathFinder undermines the ability to easily distribute Python applications and that in turn undermines the value of Python in the larger ecosystem. My opinion is the harm inflicted by dropping support for "__init__" in module names will be more than compensated by long-term benefits of enabling turnkey Python application distribution. But that's my personal take and I have no solid evidence to justify that claim. The evidence that PathFinder is inconsistent with other meta path finders in the standard library is irrefutable, however. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CIQP3Y... Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Steven D'Aprano

December 2020

10:18 p.m.

On Thu, Dec 10, 2020 at 10:57:32AM -0800, Guido van Rossum wrote:

...

All I have to add is that I am appalled that people actually write `from foo import __init__`

...

Guido van Rossum

10:48 p.m.

On Thu, Dec 10, 2020 at 1:22 PM Steven D'Aprano <steve@pearwood.info> wrote:

...

On Thu, Dec 10, 2020 at 10:57:32AM -0800, Guido van Rossum wrote:

...
All I have to add is that I am appalled that people actually write `from foo import __init__`

I too would be appalled if that was what people are doing, but it isn't.

Looking at the code samples in the wild:

https://github.com/search?l=Python&q=%22from+.__init__+import%22&type=Code

I see no examples of either of the (anti-)patterns given by Gregory in his opening post:

# No examples of either of these. from package import __init__ import package.__init__

I have read through the first ten, and last five, pages of the search results and what people are doing is typically:

from __init__ import name

or sometimes with a wildcard import. There are also a few cases using a dot:

from .__init__ import name

and a few cases of attempted bilingual 2 and 3 code:

try: from .__init__ import name #Python 3 except: from __init__ import name

So what seems to be happening is that people have a package sub-module, say "package/run.py", and in Python 2 code they wanted to import from the main package from within run.py.

...

I don't know if that is less appalling than what Gregory has told us, but it is different, and to my mind at least it makes it more understandable and not as weird looking.

Well, if I really meant to do something like that I'd write try: from . import name except ImportError: from __init__ import name but I would never write from .__init__ import name

...

This is nothing like Gregory's characterisation of cargo cult (his term) programmers writing `import package.__init__` to import package.

Well, depending on where you are in the stack, there may be no way to distinguish that from other forms involving a dot.

...

...
As for how people can check whether a package is a namespace package,

[...] there

...
are many other ways to check for that without attempting to import `__init__` from it.

Are these many other ways a secret? *wink* Because if somebody with the experience and knowledge of MAL doesn't know them, let alone people like me, maybe you should give us a hint what they are.

...

...
...
import foo, bar foo <module 'foo' (namespace)> bar <module 'bar' from 'C:\\Users\\gvanrossum\\cpython\\bar\\__init__.py'> dir(foo) ['__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__'] dir(bar) ['__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__'] foo.__file__ bar.__file__ 'C:\\Users\\gvanrossum\\cpython\\bar\\__init__.py' foo.__path__ _NamespacePath(['C:\\Users\\gvanrossum\\cpython\\foo']) bar.__path__ ['C:\\Users\\gvanrossum\\cpython\\bar']

(Note that `foo.__file__` prints nothing because the value is None.)

-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
&lt;http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>

Paul Moore

8:17 p.m.

On Thu, 10 Dec 2020 at 17:48, Gregory Szorc <gregory.szorc@gmail.com> wrote:

...

So one can make the argument that this one-off behavior of PathFinder undermines the ability to easily distribute Python applications and that in turn undermines the value of Python in the larger ecosystem. My opinion is the harm inflicted by dropping support for "__init__" in module names will be more than compensated by long-term benefits of enabling turnkey Python application distribution. But that's my personal take and I have no solid evidence to justify that claim. The evidence that PathFinder is inconsistent with other meta path finders in the standard library is irrefutable, however.

1558

Age (days ago)

1562

Last active (days ago)

List overview

Download

21 comments

8 participants

participants (8)

Ethan Furman
Filipe Laíns
Gregory Szorc
Guido van Rossum
M.-A. Lemburg
Paul Moore
Serhiy Storchaka
Steven D'Aprano