Mailman 3 Suggestion for behaviour change import mechanics - Python-ideas

Suggestion for behaviour change import mechanics

Richard Vogel

Oct. 28, 2019

11:44 a.m.

Hello everyone, I hope that E-Mail reaches someone. Since its the first time I am using a thing like a mailing list, I am saying sorry in advance for any inconvience caused ;-) However, I am writing you because of a small idea / feature request for the python import mechanism, which caused me some (in my opinion unnecessary) trouble yesterday: I noticed that when using an import statement, the following seems to be true: Current state: * Python will search for the first TOP-LEVEL hit when resolving an import statement and search inside there for the remainder part of the import. If it cannot find the symbols it will fail. (Tested on Python 3.8) Proposed Change: * If the import fails at some point after finding the first level match: The path is evaluated further until it eventually may be able to resolve the statement completely- o --> Fail later My use case scenario: * I have a bunch of different projects built using Python * I want to use parts of it within a new project * I place them withina sub-folder (for example libs) within the new project using git submodule or just copy / link them there, whatever * I append the libs to path (sys.path.append) * Python WILL find the packages and basically import everything right * Problem: o if themain package does actually contain a toplevel folder that is named the same like one within the other modules (for example a "ui" submodule) python will search withon one and only one of these ui modules within exactly one project o Name clashes can only be avoided by renaming I know that this is propably not the suggested and best way to reuse existing code. But its the most straight-forward and keeps the fastest development cycle, which is I think is a reason Python grew so fast. At the time of writing I cannot think of any place where this change would destroy or change any already working code and I don't see a reason why the import would completely fail under this circumstances showing a different behaviour for top level fails and nth-level fails. What do you think about that hopefully very small change of the import behaviour? Thanks for your time reading and Best wishes, Richard

Attachments:

attachment.htm (text/html — 2.9 KB)

Show replies by date

Steven D'Aprano

October 2019

1:40 p.m.

Hi Richard, and welcome! My comments are below, interspersed with your comments, which are prefixed with ">". On Mon, Oct 28, 2019 at 12:44:32PM +0100, Richard Vogel wrote: [...]

...

Current state:

* Python will search for the first TOP-LEVEL hit when resolving an import statement and search inside there for the remainder part of the import. If it cannot find the symbols it will fail. (Tested on Python 3.8)

Proposed Change:

* If the import fails at some point after finding the first level match: The path is evaluated further until it eventually may be able to resolve the statement completely- o --> Fail later

I'm not exactly sure what your situation is. Perhaps you be a bit more specific? But I'm going to try to take a guess: - your import search path has two or more directories, let's call them "a" and "b"; - in "a" you have a module "foo.py": # a/foo.py spam = 1 - in "b" you have another module also called "foo.py": # b/foo.py spam = 1 eggs = 2 If you say "from foo import eggs" the import system finds a/foo.py first, and fails since there is no "eggs" name in that module. You would prefer the import system to continue searching, find b/foo.py, and import eggs=2 from that module. Is my understanding correct? -- Steven

mereep＠gmx.net

1:26 p.m.

Yes :) That looks similiar to my problem, which i fixed for the moment of writing by renaming one folder. Basically, I would prefer if Python would not treat nth-level import fails different than first-level fails. Might seem like a small thing, but in my opinion it would: - keep the behaviour straight forward - Correct the error messages, since from the perspective of the user the import actually DOES exist - Does not break anything currently working - Makes me happy ;)

Steven D'Aprano

10:41 p.m.

On Tue, Oct 29, 2019 at 01:26:36PM -0000, mereep@gmx.net wrote:

...

Yes :)

That looks similiar to my problem, which i fixed for the moment of writing by renaming one folder. Basically, I would prefer if Python would not treat nth-level import fails different than first-level fails.

I think you mean that you would prefer if Python *would* treat nth-level import failures differently. If the first level fails, the import search stops immediately and raises; you want the import search to continue if the second level fails.

...

Might seem like a small thing, but in my opinion it would: - keep the behaviour straight forward - Correct the error messages, since from the perspective of the user the import actually DOES exist - Does not break anything currently working - Makes me happy ;)

I disagree with all but the last. Think about the behaviour of ``from module import name`` in pure Python. Currently, it is straightforward to explain: try to import module, or fail if it doesn't exist; name = module.name, or fail if the name doesn't exist. With your behaviour, it becomes something complicated: try to import module, or fail if it doesn't exist; remember where we are in the module import search path; try to set name = module.name; if that succeeds, return; else go back to line 1, but picking up the search from where we left off. Re your point 2, I'm a user, and from my perspective, if module.name doesn't exist in the first module found, then it doesn't exist until I move or rename one of the clashing modules. That's how shadowing works, and I like it that way because it makes it easy to reason about what my code will do. But the critical point is #3. Any change in behaviour could break working code, and this could too: # Support many different versions of Python, or old versions # of the module, or drop-in replacement of the module. try: from module import name except ImportError: from fallback_module import spam as name Same for the similar idiom: try: from module import name except ImportError: def name(arg): # re-implement basic version as a fallback I use those two idioms frequently. The last thing I want is for the behaviour of import to change, which would potentially change the behaviour above from what's tested and works to something which might pick up *the wrong module* from what I'm expecting it to pick up. You are asking us to change the behaviour of import in the case of shadowing, to support a use-case which is (in my, and probably most peoples, opinion) very poor practice. That has some pretty surprising consequences: from module import spam from module import eggs You would expect that spam and eggs both come from the same module, right? Surprise! No they don't, with your proposal they might come from different modules. That's going to play havok with debugging. import module from module import name assert name is module.name You would expect that assertion to be true, right? (Importing doesn't copy objects.) Surprise! The assertion can raise AttributeError: module.name might not exist even if ``from module import name`` succeeds. I expect that with a bit more thought I could come up with some more scenarios where the behaviour of Python programs could change in very surprising ways. So I will argue against this proposed change. -- Steven

Richard Vogel

11:54 p.m.

I basically agree with many of your points. I have to admit that I am likely to be biased from my view, since I got hindered by the current behavior in that scenario. As you correctly stated,

...

import module from module import name assert name is module.name

and

...

from module import spam from module import eggs

things can become confusing, especially if you are reading the code as a third person that doesn't know someone might have added stuff to path resulting into this conflict. Actually I am surprised that someone actually things it through that far and comes up out of the nothing with those examples. Makes me already a bit happy ;) However, I don't see where

...

# Support many different versions of Python, or old versions # of the module, or drop-in replacement of the module. try: from module import name except ImportError: from fallback_module import spam as name

might change a actual thing. Case 1: Lets say "module" does not exist -> both fail Case 2: Exactly one "module" exists but it does not contain "name" -> both fail Case 3: Multiple "module" exist. At least one having "name" inside - Current Variant suceeds if and only if the first touched "module" does contain "name", otherwise fail - Proposed Variant will fail if and only if NO "module" at all contains "name" --> Why the first variant would be anyhow more correct than the second? A later module could have the "name" which you actually wanted, but you don't load it and falsly do a fallback Still, I see in sum that this at best is a trade-off, which one could argue pros and cons. So lets take a step back and maybe agree that the "shadowing" behaviour may come in unexpected also. The thing I made (which I right now changed completely thanks to the discussions to a setup.py X pip install -e solution that basically solves it) other people will do also. Given you're not too familiar with development installs or python setup at all, this is a thing you might not even know of or see as an unneeded obstacle. Hence, you will do some supposedly straight-forward thing like making your libs available in path. On top you might not even know that some dependency of a dependencies dependency might shadow your imports at all or tempers also with the path on its own. To conclude that, without changing the behaviour, Python might at least recognize such a shadowing case and spill out a warning if something shadows something else (Like "Unreachable module") since it obviously could find it out for example with your proposed import algorithm variant. Someone then could actually google it, find the reasoning and would stumple about ways to fix his/her presumably bad practice. Thanks anyways for taking the time thinking through the implications clarifying your points! Am 29.10.2019 um 23:41 schrieb Steven D'Aprano:

...

...
Yes :)

That looks similiar to my problem, which i fixed for the moment of writing by renaming one folder. Basically, I would prefer if Python would not treat nth-level import fails different than first-level fails. I think you mean that you would prefer if Python *would* treat nth-level import failures differently. If the first level fails, the import search stops immediately and raises; you want the import search to continue if

On Tue, Oct 29, 2019 at 01:26:36PM -0000, mereep@gmx.net wrote: the second level fails.

...
Might seem like a small thing, but in my opinion it would: - keep the behaviour straight forward - Correct the error messages, since from the perspective of the user the import actually DOES exist - Does not break anything currently working - Makes me happy ;) I disagree with all but the last.

Think about the behaviour of ``from module import name`` in pure Python. Currently, it is straightforward to explain:

try to import module, or fail if it doesn't exist; name = module.name, or fail if the name doesn't exist.

With your behaviour, it becomes something complicated:

try to import module, or fail if it doesn't exist; remember where we are in the module import search path; try to set name = module.name; if that succeeds, return; else go back to line 1, but picking up the search from where we left off.

Re your point 2, I'm a user, and from my perspective, if module.name doesn't exist in the first module found, then it doesn't exist until I move or rename one of the clashing modules. That's how shadowing works, and I like it that way because it makes it easy to reason about what my code will do.

But the critical point is #3. Any change in behaviour could break working code, and this could too:

# Support many different versions of Python, or old versions # of the module, or drop-in replacement of the module. try: from module import name except ImportError: from fallback_module import spam as name

Same for the similar idiom:

try: from module import name except ImportError: def name(arg): # re-implement basic version as a fallback

I use those two idioms frequently. The last thing I want is for the behaviour of import to change, which would potentially change the behaviour above from what's tested and works to something which might pick up *the wrong module* from what I'm expecting it to pick up.

You are asking us to change the behaviour of import in the case of shadowing, to support a use-case which is (in my, and probably most peoples, opinion) very poor practice. That has some pretty surprising consequences:

from module import spam from module import eggs

You would expect that spam and eggs both come from the same module, right? Surprise! No they don't, with your proposal they might come from different modules. That's going to play havok with debugging.

import module from module import name assert name is module.name

You would expect that assertion to be true, right? (Importing doesn't copy objects.) Surprise! The assertion can raise AttributeError: module.name might not exist even if ``from module import name`` succeeds.

I expect that with a bit more thought I could come up with some more scenarios where the behaviour of Python programs could change in very surprising ways.

So I will argue against this proposed change.

Steven D'Aprano

9 p.m.

On Wed, Oct 30, 2019 at 12:54:13AM +0100, Richard Vogel wrote:

...

However, I don't see where

...
# Support many different versions of Python, or old versions # of the module, or drop-in replacement of the module. try: from module import name except ImportError: from fallback_module import spam as name

might change a actual thing.

Case 1: Lets say "module" does not exist -> both fail

When you say "both", you mean under the current behaviour, versus the proposed behaviour? And by "fail", you mean import from the fallback module? If so, then you are correct, there is no change between the status quo and the proposed behaviour for this scenario.

...

Case 2: Exactly one "module" exists but it does not contain "name" -> both fail

Again, no change in the status quo.

...

Case 3: Multiple "module" exist. At least one having "name" inside

This is the scenario with a possible change in behaviour. To make it clear, let's suppose we have two modules called "spam.py", which I will tag as spam#1 and spam#2 (the tags #1 and #2 are not part of the file name, just listed to aid understanding). - spam#1 comes first in the path, and does not include "name"; - spam#2 comes second in the path, and does include "name". The status quo is that importing "name" from spam#1 fails and so the fallback version is loaded. But under the proposed change in behaviour, importing "name" from spam#1 fails, but instead of loading the fallback version, "name" from spam#2 is loaded instead. That's a change in behaviour. It could be an unwelcome one: just because both files are called "spam" doesn't mean that they are functionally equivalent. spam.name (from file #2) may be buggy, broken, obsolete, experimental, or do something completely unexpected. At the very least, it means that as the developer, there are two cases I am aware of and have tested: - load "name" from spam#1 - load fallback but there's a third, non-obvious scenario I may have forgotten or be unaware of: - load "name" from spam#2 and consequently have never tested. The point is not that the proposed behaviour is *necessarily* bad in this scenario, but that it is a change in behaviour. -- Steven

Richard Vogel

5:23 p.m.

...

The point is not that the proposed behaviour is *necessarily* bad in this scenario, but that it is a change in behaviour. Guess it would be a long debate to find out the pros and cons of that. Any variant might in some case be more or be less desireable. What holds still anyways is that module name shadowing is hurting many peoples in unexpected ways. I read already multiple times when people were having these issues, since they are unnoticed.

Python *could* recognize these shadowing cases and spill out a warning, like: "Hey developer your module foo does actually exists multiple times in import resplve paths (here, here and here). According to resolve order (see: http://xyz) module (here1) will shadow the others" People then could at least understand what is happening and don't stumple upon the "its here, why Python tells me its not here?!"-effect. Will not break anything and would only introduce a first-time importing work. Am 30.10.2019 um 22:00 schrieb Steven D'Aprano:

...

On Wed, Oct 30, 2019 at 12:54:13AM +0100, Richard Vogel wrote:

...
However, I don't see where

...
# Support many different versions of Python, or old versions # of the module, or drop-in replacement of the module. try: from module import name except ImportError: from fallback_module import spam as name

might change a actual thing.

Case 1: Lets say "module" does not exist -> both fail When you say "both", you mean under the current behaviour, versus the proposed behaviour? And by "fail", you mean import from the fallback module? If so, then you are correct, there is no change between the status quo and the proposed behaviour for this scenario.

...
Case 2: Exactly one "module" exists but it does not contain "name" -> both fail Again, no change in the status quo.

...
Case 3: Multiple "module" exist. At least one having "name" inside This is the scenario with a possible change in behaviour. To make it clear, let's suppose we have two modules called "spam.py", which I will tag as spam#1 and spam#2 (the tags #1 and #2 are not part of the file name, just listed to aid understanding).

- spam#1 comes first in the path, and does not include "name";

- spam#2 comes second in the path, and does include "name".

The status quo is that importing "name" from spam#1 fails and so the fallback version is loaded.

But under the proposed change in behaviour, importing "name" from spam#1 fails, but instead of loading the fallback version, "name" from spam#2 is loaded instead.

That's a change in behaviour. It could be an unwelcome one: just because both files are called "spam" doesn't mean that they are functionally equivalent. spam.name (from file #2) may be buggy, broken, obsolete, experimental, or do something completely unexpected.

At the very least, it means that as the developer, there are two cases I am aware of and have tested:

- load "name" from spam#1

- load fallback

but there's a third, non-obvious scenario I may have forgotten or be unaware of:

- load "name" from spam#2

and consequently have never tested.

The point is not that the proposed behaviour is *necessarily* bad in this scenario, but that it is a change in behaviour.

Paul Moore

8:12 a.m.

On Tue, 29 Oct 2019 at 22:42, Steven D'Aprano <steve@pearwood.info> wrote:

...

I expect that with a bit more thought I could come up with some more scenarios where the behaviour of Python programs could change in very surprising ways.

If you add a module with the same name as a stdlib module to sys.path, current semantics are that the stdlib wins. The proposed semantics would allow the added module to *add* functions (in effect). Consider a malicious module that adds names that match common typos for stdlib functions. Such a module could cause a typo in user code to trigger an exploit, rather than simply failing. While unlikely to happen, this has the potential to be a new security vulnerability. Paul

Steven D'Aprano

8:31 a.m.

On Wed, Oct 30, 2019 at 08:12:12AM +0000, Paul Moore wrote:

...

On Tue, 29 Oct 2019 at 22:42, Steven D'Aprano <steve@pearwood.info> wrote:

...
I expect that with a bit more thought I could come up with some more scenarios where the behaviour of Python programs could change in very surprising ways.

If you add a module with the same name as a stdlib module to sys.path, current semantics are that the stdlib wins.

I don't think so... shadowing of the stdlib by accident is a common problem. https://www.reddit.com/r/Python/comments/hy2gr/beginner_trouble_using_urllib... https://stackoverflow.com/questions/25476044/error-while-trying-to-import-so...

...

The proposed semantics would allow the added module to *add* functions (in effect). Consider a malicious module that adds names that match common typos for stdlib functions. Such a module could cause a typo in user code to trigger an exploit, rather than simply failing. While unlikely to happen, this has the potential to be a new security vulnerability.

If an attacker can write files in sys.path, they've already won :-) -- Steven

Paul Moore

9:50 a.m.

On Wed, 30 Oct 2019 at 08:32, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, Oct 30, 2019 at 08:12:12AM +0000, Paul Moore wrote:

...
If you add a module with the same name as a stdlib module to sys.path, current semantics are that the stdlib wins.

I don't think so... shadowing of the stdlib by accident is a common problem.

That's the script directory, which is a (slightly) different issue - the script directory is placed ahead of the stdlib on sys.path, but other install directories come later. Like you say, shadowing via things in the script directory is a relatively well-known issue, and that wasn't the point I was trying to demonstrate here.

...

...
The proposed semantics would allow the added module to *add* functions (in effect). Consider a malicious module that adds names that match common typos for stdlib functions. Such a module could cause a typo in user code to trigger an exploit, rather than simply failing. While unlikely to happen, this has the potential to be a new security vulnerability.

If an attacker can write files in sys.path, they've already won :-)

Conceded. Although the normal attack vector is to get someone to import your malicious package. With this change, there's a new attack vector, getting someone to reference an undefined name from a trusted package. As I said, though, it's unlikely, and just a *potential* issue. I think the other points made (in particular the ones in your original mail that I replied to) make the point sufficiently that this change is not a good idea, regardless of the validity of the security risk. Paul

Brendan Barnwell

3:17 p.m.

On 2019-10-30 02:50, Paul Moore wrote:

...

...
If an attacker can write files in sys.path, they've already won :-)

Conceded. Although the normal attack vector is to get someone to import your malicious package. With this change, there's a new attack vector, getting someone to reference an undefined name from a trusted package. As I said, though, it's unlikely, and just a *potential* issue.

There's nothing new about that either, though. Any imported module can already monkeypatch a stdlib module to add such typo-names and map them to malicious functions. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Paul Moore

8:20 p.m.

On Wed, 30 Oct 2019 at 18:45, Brendan Barnwell <brenbarn@brenbarn.net> wrote:

...

There's nothing new about that either, though. Any imported module can already monkeypatch a stdlib module to add such typo-names and map them to malicious functions.

lol, and that's why I'd never make a good security auditor :-) Thanks for pointing this out. Paul

Andrew Barnert

10:14 p.m.

On Oct 30, 2019, at 16:17, Brendan Barnwell <brenbarn@brenbarn.net> wrote:

...

There's nothing new about that either, though. Any imported module can already monkeypatch a stdlib module to add such typo-names and map them to malicious functions.

Well, for that attack to work you have to get the user to import your module (or otherwise write some code); for Paul’s attack on the proposed feature you only have to get them to save a file somewhere on sys.path. However, the easiest way to do that is probably to get them to save the file in the script directory—and if you can do that, you can already shadow any stdlib or other module completely.

Serhiy Storchaka

9:58 a.m.

30.10.19 00:41, Steven D'Aprano пише:

...

Think about the behaviour of ``from module import name`` in pure Python. Currently, it is straightforward to explain:

try to import module, or fail if it doesn't exist; name = module.name, or fail if the name doesn't exist.

Things are more complicated. If module.name does not exist we try sys.modules["module.name"] (it is even more complicated in details).

Andrew Barnert

4:30 p.m.

On Oct 28, 2019, at 04:44, Richard Vogel <mereep@gmx.net> wrote:

...

Current state: Python will search for the first TOP-LEVEL hit when resolving an import statement and search inside there for the remainder part of the import. If it cannot find the symbols it will fail. (Tested on Python 3.8) Proposed Change:

If the import fails at some point after finding the first level match: The path is evaluated further until it eventually may be able to resolve the statement completely- --> Fail later

You might be looking for namespace packages (https://packaging.python.org/guides/packaging-namespace-packages/). You create a myutils namespace package at spamlib/libs/myutils, that Includes a module named spam.py (it also works for subpackages, extension modules, etc., but let’s keep it simple), and put spamlib/libs on your sys.path. You create a second myutils namespace package at eggslib/lib/myutils, that includes a module named eggs.py, and put eggslib/lib on your sys.path. Now, `from myutils import spam` looks in the first `myutils` directory and finds `spam.py` and imports it. And `from myutils import eggs` looks in the first `myutils` directory and sees that there’s no `eggs.py`—but, because it’s a namespace package, it looks for other package directories names `myutils` to add to the namespace, and it finds your second directory and imports the `eggs.py` there. I’m not sure this is what you want, and I’m not sure it’s a good idea for your code, but if it _is_ what you want, all you have to do to make this work is not create a myutils/__init__.py in either directory. (It you need an __init__.py but also need namespace behavior, you can do that with pkg_resources, but you probably don’t want to—especially since your whole goal seems to be avoiding installing packages.)

...

My use case scenario:

I have a bunch of different projects built using Python I want to use parts of it within a new project I place them withina sub-folder (for example libs) within the new project using git submodule or just copy / link them there, whatever I append the libs to path (sys.path.append) Python WILL find the packages and basically import everything right Problem: if themain package does actually contain a toplevel folder that is named the same like one within the other modules (for example a "ui" submodule) python will search withon one and only one of these ui modules within exactly one project Name clashes can only be avoided by renaming I know that this is propably not the suggested and best way to reuse existing code. But its the most straight-forward and keeps the fastest development cycle, which is I think is a reason Python grew so fast.

I don’t think it does keep the fastest development cycle. People who haven’t learned how to use virtual environments, requirements.txt, and a trivial setup.py usually think this is a whole lot of work that would get in their way. But it isn’t. There’s a small learning curve, but once you get there, it actually means only a tiny bit of work up front that saves you a lot more work in the long run. Especially because working in-place means you keep running into new problems that have already been solved, but haven’t been solved for your use case (adding a third-party dependency to a submodule without having to go edit every one of your top-level projects, needing a build step for one of your submodules, trying to package the whole thing up for distribution as a PyPI package or a company-internal package or a .exe or .app binary…). But I don’t think this is directly relevant to your problem or your suggestion—e.g., if you actually are looking for namespace packages, they’re just as occasionally-necessary-and-incredibly-handy-when-that-happens for installed packages as for in-place submodules.

mereep＠gmx.net

1:33 p.m.

Thanks for that hints! This namespace module concept indeed looks like that is kind of the behaviour that would like to achieve. I will take a look at that. On the first scroll in the docs, that doesn't seem to map to the namespace-concepts that are known from C++ or PHP, where you explicitly define them within the file, right? I will take a look anyways, since the result seems to be the same. Anyways, just failing later would not hurt the Python language I think ;)

Andrew Barnert

4:43 p.m.

On Oct 29, 2019, at 06:33, mereep@gmx.net wrote:

...

On the first scroll in the docs, that doesn't seem to map to the namespace-concepts that are known from C++ or PHP, where you explicitly define them within the file, right?

The meaning of “namespace” is a place where you can bind values to names and look them up by name, usually with dot syntax or something similar. Every module is a namespace containing its globals. A function activation frame is a namespace containing its locals (although it’s a hidden one you can’t access like a module). Every object is a namespace containing its attributes. And so on. In C++ (until the upcoming C++20 version), there are no modules (if you #include a header file, you’re just including the header’s text in the middle of your file), so they added the namespace statement to create explicit module-like namespaces. I don’t know as much about PHP, but most of its design is intended to feel familiar to C and C++ developers even if it doesn’t quite make sense, so I’m guessing that’s why it has a similar feature. Anyway, in Python, a “namespace package” is a package that’s _just_ a namespace—it contains modules and other packages that can be looked up by name, but doesn’t have any top-level code that gets run when you import it. This is similar to the SimpleNamespace type—it’s not a class whose instances are namespaces, because every class’s instances are namespaces; it’s a class whose instances are _just_ namespaces, with no other behavior on top of that. So, a namespace package is just one without an __init__.py file or equivalent. It’s a bit weird that Python conflates the notion of namespace packages (packages that are just namespaces, with no other behavior) with open or composite packages (packages that can be added to by other packages); I believe it works that way for historical reasons that go back to the way you faked it in 2.x with a third-party library.

Ethan Furman

4:53 a.m.

On 10/29/2019 06:33 AM, mereep@gmx.net wrote:

...

Anyways, just failing later would not hurt the Python language I think ;)

It would, as Stephen D'Aprano explained. -- ~Ethan~

mereep＠gmx.net

2:54 p.m.

I noticed some different (strange) behaviour when handling projects like that. Imagine following folder structure # project_folder # a_submodule * a.py * b.py * __init__.py * main.py - content a.py: class Foo: pass - content b.py: from a import Foo foo = Foo() content main.py: from os.path import join as pjoin import os import sys # put a_submodule into path curr_dir = os.path.abspath(os.path.dirname(__file__)) sys.path.append(pjoin(curr_dir, 'a_submodule')) # import them using two different ways (1: directly, 2: resolving through path) from a_submodule.b import foo as instance_1 from b import foo as instance_2 assert instance_1 is instance_2, "Instances are not the same" # will raise(!) is it intended that a different input mechanism results in completely ignoring the files as being the same. Python does not seem to understand that it already imported a thing when the import statement differs. I noticed that in my code when suddenly Enums where different while being the same depending on which code part they where executed from. That brings me to a point where I cannot handle the project like this at all in that manner :/

Andrew Barnert

4:16 p.m.

On Oct 29, 2019, at 07:54, mereep@gmx.net wrote:

...

I noticed some different (strange) behaviour when handling projects like that.

Please quote some text from the message you’re replying to, so we know what you’re referencing. It’s very hard to carry on discussions on a mailing list if we have to guess which of the four (or, in along thread, forty) posts you might be talking about. If your mailing makes it to much of a pain to quote inline like I’m doing, you should at least be able to “top-post”, to include part of the original message below your reply.

...

Imagine following folder structure

# project_folder # a_submodule * a.py * b.py * __init__.py

* main.py

- content a.py:

class Foo: pass

- content b.py:

from a import Foo foo = Foo()

content main.py:

from os.path import join as pjoin import os import sys

# put a_submodule into path curr_dir = os.path.abspath(os.path.dirname(__file__)) sys.path.append(pjoin(curr_dir, 'a_submodule'))

This is your problem. Never put a package on your sys.path, only put the directory containing the package on it. What happens if you break this rule is exactly what you’re seeing. The way Python ensures that doing `import spam` twice results in the same spam module object (so your globals don’t all get duplicated) is by storing modules by qualified name in sys.modules. So if the same file has two different qualified names, it’s two separate modules. This also means that if you do this: import spam spam1 = spam del sys.modules['spam'] del spam import spam assert spam is spam1 … the assert will fail. If you actually need this behavior for some reason, you usually want to call importlib directly and avoid using the sys.modules cache. But usually you don’t want this behavior. See the docs on the import system (https://docs.python.org/3/reference/import.html)for full details on how this works. The docs for the importlib module are also helpful.

Richard Vogel

6:45 p.m.

...

Please quote some text from the message you’re replying to, so we know what you’re referencing. It’s very hard to carry on discussions on a mailing list if we have to guess which of the four (or, in along thread, forty) posts you might be talking about. If your mailing makes it to much of a pain to quote inline like I’m doing, you should at least be able to “top-post”, to include part of the original message below your reply. I used the websites reply function actually to generate that message. I replyed to my own thread since this was more like a side-effect that I encountered, which resulted in the kind of behaviour you described.

...

What happens if you break this rule is exactly what you’re seeing. The way Python ensures that doing `import spam` twice results in the same spam module object (so your globals don’t all get duplicated) is by storing modules by qualified name in sys.modules. So if the same file has two different qualified names, it’s two separate modules. I got that. That explains the behaviour I got, where Enum-Entries suddenly where unequal to the "same" Enum-Entries causing a crash. Which actually was better than having this behaviour happening unseen and having the supposed same thing multiple times. That would have resulted in two differen EventQueues. Cannot imagine all the time I would have spent until I would have realized having two of them suddenly ;)

Is there a reasoning for that behavior of Python? I am watching from outside and I know its always easy to say this is strange while there might be legit reasoning behind that. The thing is: From the point of the user the fact that importing the very same thing two times resulting in different realizations thereof is as counter-intuitive as it gets. Its a bit like driving home coming from a different street than normal brings me in a house that is my house but is not my house ;) To simplify: Why a thing isn't equal when its physically the same thing, meaning the checksum is the same or its the same absolute path or .... I hope that message ends up where it should. Have no idea, will just send it and see what happens ;) And thanks for all your patience and all the time reading and answering. It helps me out. I will go with setup.py's in development mode for that thing. Brings me almost to the same,and removes the local git submodules I have currently. Am 29.10.2019 um 17:16 schrieb Andrew Barnert:

...

On Oct 29, 2019, at 07:54, mereep@gmx.net <mailto:mereep@gmx.net> wrote:

...
I noticed some different (strange) behaviour when handling projects like that.

Please quote some text from the message you’re replying to, so we know what you’re referencing. It’s very hard to carry on discussions on a mailing list if we have to guess which of the four (or, in along thread, forty) posts you might be talking about. If your mailing makes it to much of a pain to quote inline like I’m doing, you should at least be able to “top-post”, to include part of the original message below your reply.

...
Imagine following folder structure

# project_folder # a_submodule * a.py * b.py * __init__.py

* main.py

- content a.py:

class Foo: pass

- content b.py:

from a import Foo foo = Foo()

content main.py:

from os.path import join as pjoin import os import sys

# put a_submodule into path curr_dir = os.path.abspath(os.path.dirname(__file__)) sys.path.append(pjoin(curr_dir, 'a_submodule'))

This is your problem. Never put a package on your sys.path, only put the directory containing the package on it.

What happens if you break this rule is exactly what you’re seeing. The way Python ensures that doing `import spam` twice results in the same spam module object (so your globals don’t all get duplicated) is by storing modules by qualified name in sys.modules. So if the same file has two different qualified names, it’s two separate modules.

This also means that if you do this:

import spam spam1 = spam del sys.modules['spam'] del spam import spam assert spam is spam1

… the assert will fail.

If you actually need this behavior for some reason, you usually want to call importlib directly and avoid using the sys.modules cache. But usually you don’t want this behavior.

See the docs on the import system (https://docs.python.org/3/reference/import.html)for full details on how this works. The docs for the importlib module are also helpful.

Andrew Barnert

7:54 p.m.

On Oct 29, 2019, at 11:45, Richard Vogel <mereep@gmx.net> wrote:

...

...
What happens if you break this rule is exactly what you’re seeing. The way Python ensures that doing `import spam` twice results in the same spam module object (so your globals don’t all get duplicated) is by storing modules by qualified name in sys.modules. So if the same file has two different qualified names, it’s two separate modules. I got that. That explains the behaviour I got, where Enum-Entries suddenly where unequal to the "same" Enum-Entries causing a crash. Which actually was better than having this behaviour happening unseen and having the supposed same thing multiple times. That would have resulted in two differen EventQueues. Cannot imagine all the time I would have spent until I would have realized having two of them suddenly ;) Is there a reasoning for that behavior of Python?

I suspect this goes so far back into the mists of time that there’s no mailing list discussion or anything. But I can take a guess. First, the real issue here is that it’s confusing to have the same module exist under two different names. Normally, the module spam.eggs and the module eggs shouldn’t be the same thing. Especially given that, unlike most objects in Python, modules know their qualified names, and actually _need_ to know them for things like pickle and multiprocessing to work. It would be misleading if you looked at the qualified name of spam.eggs and got back something other than "spam.eggs", and not just to human readers, but to code. While there are rare occasions when you might want spam.eggs and eggs to be the same thing, there are also rare occasions when you might want the same source to import as two separate objects. And both are less common than doing it mistakenly. Since these are both rare, the One Obvious Way To Do each one ought to be something obviously unusual (manipulating sys.modules, or manually using importlib) that signals your unusual intention to your readers. Although I suspect the actual reasoning is just that, because these are both rare use cases, nobody bothered to design the behavior around either of them; instead, they just went with the simplest implementation that handles the non-rare cases as intended, and then just documented what it does in the rare cases as the behavior. And then, in the years since then (especially at 3.0 and when the new import system was implemented a few versions later) nobody had a compelling reason to change the rules. The biggest consequence is the case where script.py is a runnable script but also a module, and thanks to a circular import somewhere it ends up getting imported indirectly by itself, so you have modules named "__main__" and "script" built from the same source. That one actually comes up, because you don’t really have to break any rules for it to happen (circular imports are legal, and work fine in some cases, even if they’re confusing in general and don’t work in other cases and should usually be avoided), but it’s effectively the same problem you’re running into. People have actually made proposals to fix that in some way (whether to make it a detectable error, or to special-case things so sys.modules['script'] = sys.modules['__main__'] from the start, or something else), but I don’t think anyone’s come up with a proposal that everyone else liked. If you want to know more about how people think about this whole wider issue, maybe search for the proposals on that narrower one.

...

Why a thing isn't equal when its physically the same thing, meaning the checksum is the same or its the same absolute path or ....

Some things in Python act like “values”, where there’s a notion of equality based on equal contents—int, str, tuple, namedtuple and dataclass types, etc. But most other things act like “objects”, where no object is equal to anything but itself. Sometimes it’s about implicit (especially mutable) state—two different file objects are never equal, even if they represent the same disk file with the same position. Sometimes it’s about needing to be able to create distinct things—two Enum members from different classes are never equal even if they have the same name and value, and two Enum classes are never equal even if they have exactly the same members. (Imagine if you had code that did different things with ForegroundColor and BackgroundColor objects, and then they magically became the same type when you added bright background colors.) Would you really want two different modules to be the same just because they happened to have the same contents, or happened to get those contents by executing the same source? You definitely wouldn’t want them to be identical (imagine if a.py and b.py were empty, you did `import a; import b; a.spam=2; b.spam=3`, and a.spam was now 3.) And I don’t think you’d want two non-identical modules to be equal. But of course their contents wouldn’t be equal, because even a module built from an empty .py file has some default attributes, including __name__, which will be different between a and b. (As a side note, modules don’t always have an absolute path. The most common way to get a module is as a .py file in a directory on sys.path, but there are other ways—cached .pyc flies delivered without the .py file, extension modules, modules inside zip archives, even modules that use arbitrary custom finders to pull them off a web server or out of a database, or simulate hierarchy on top of a flat filesystem, or whatever. Some of these have a confusing path, some have no path at all. They do always have a loader spec, which could theoretically serve the same purpose. But modules don’t remember the loader spec used to load them. Plus, loader specs, unlike names, are a low-level detail that most Python developers probably never learn.)

Richard Vogel

10:05 p.m.

...

Although I suspect the actual reasoning is just that, because these are both rare use cases, nobody bothered to design the behavior around either of them; instead, they just went with the simplest implementation that handles the non-rare cases as intended, and then just documented what it does in the rare cases as the behavior. And then, in the years since then (especially at 3.0 and when the new import system was implemented a few versions later) nobody had a compelling reason to change the rules. Thats propably what it mostly is. A mixture between there is no straight-forward always correct way and history :)

...

Would you really want two different modules to be the same just because they happened to have the same contents, or happened to get those contents by executing the same source? You definitely wouldn’t want them to be identical (imagine if a.py and b.py were empty, you did `import a; import b; a.spam=2; b.spam=3`, and a.spam was now 3.) And I don’t think you’d want two non-identical modules to be equal.

Don't think so, thats a point. So its not so straight forward to define whats identical.

...

(As a side note, modules don’t always have an absolute path. The most common way to get a module is as a .py file in a directory on sys.path, but there are other ways—cached .pyc flies delivered without the .py file, extension modules, modules inside zip archives, even modules that use arbitrary custom finders to pull them off a web server or out of a database, or simulate hierarchy on top of a flat filesystem, or whatever. Some of these have a confusing path, some have no path at all. They do always have a loader spec, which could theoretically serve the same purpose. But modules don’t remember the loader spec used to load them. Plus, loader specs, unlike names, are a low-level detail that most Python developers probably never learn.) That's a lot of information and a lot of coner cases, which I partly can't even understand. Guess for that one needs to understand the concept of a loader spec.

Nevertheless, could Python cache that spec and cache also a checksum of whats coming out after resolveing the loader ? So given that Python could recognize the case where you actually import something similiar with a different loader spec (for example the case where something in path is imported fully quallified and non fully qualified). Given that Python could spill out a warning that this happened (also in your circular case for example) and just do what it does now still. Then a import keyword extension like *import twin ~~~* would explicitly do what it does now anyways (Giving you the same thing in a different realization, so basically just removing the warning in that case) and *import union ~~~* which returns the same realization and removing the warning. This would force the user into knowing what they do and give the user the chance to see that such a thing happened and start thinking about what they actually wanted to do in the first place. Would that be a thing that - at least in terms of feasibility - could be done? Am 29.10.2019 um 20:54 schrieb Andrew Barnert:

...

On Oct 29, 2019, at 11:45, Richard Vogel <mereep@gmx.net <mailto:mereep@gmx.net>> wrote:

...
...
What happens if you break this rule is exactly what you’re seeing. The way Python ensures that doing `import spam` twice results in the same spam module object (so your globals don’t all get duplicated) is by storing modules by qualified name in sys.modules. So if the same file has two different qualified names, it’s two separate modules. I got that. That explains the behaviour I got, where Enum-Entries suddenly where unequal to the "same" Enum-Entries causing a crash. Which actually was better than having this behaviour happening unseen and having the supposed same thing multiple times. That would have resulted in two differen EventQueues. Cannot imagine all the time I would have spent until I would have realized having two of them suddenly ;)

Is there a reasoning for that behavior of Python?

I suspect this goes so far back into the mists of time that there’s no mailing list discussion or anything. But I can take a guess.

First, the real issue here is that it’s confusing to have the same module exist under two different names. Normally, the module spam.eggs and the module eggs shouldn’t be the same thing. Especially given that, unlike most objects in Python, modules know their qualified names, and actually _need_ to know them for things like pickle and multiprocessing to work. It would be misleading if you looked at the qualified name of spam.eggs and got back something other than "spam.eggs", and not just to human readers, but to code.

While there are rare occasions when you might want spam.eggs and eggs to be the same thing, there are also rare occasions when you might want the same source to import as two separate objects. And both are less common than doing it mistakenly. Since these are both rare, the One Obvious Way To Do each one ought to be something obviously unusual (manipulating sys.modules, or manually using importlib) that signals your unusual intention to your readers.

Although I suspect the actual reasoning is just that, because these are both rare use cases, nobody bothered to design the behavior around either of them; instead, they just went with the simplest implementation that handles the non-rare cases as intended, and then just documented what it does in the rare cases as the behavior. And then, in the years since then (especially at 3.0 and when the new import system was implemented a few versions later) nobody had a compelling reason to change the rules.

The biggest consequence is the case where script.py is a runnable script but also a module, and thanks to a circular import somewhere it ends up getting imported indirectly by itself, so you have modules named "__main__" and "script" built from the same source. That one actually comes up, because you don’t really have to break any rules for it to happen (circular imports are legal, and work fine in some cases, even if they’re confusing in general and don’t work in other cases and should usually be avoided), but it’s effectively the same problem you’re running into. People have actually made proposals to fix that in some way (whether to make it a detectable error, or to special-case things so sys.modules['script'] = sys.modules['__main__'] from the start, or something else), but I don’t think anyone’s come up with a proposal that everyone else liked. If you want to know more about how people think about this whole wider issue, maybe search for the proposals on that narrower one.

...
Why a thing isn't equal when its physically the same thing, meaning the checksum is the same or its the same absolute path or ....

Some things in Python act like “values”, where there’s a notion of equality based on equal contents—int, str, tuple, namedtuple and dataclass types, etc. But most other things act like “objects”, where no object is equal to anything but itself. Sometimes it’s about implicit (especially mutable) state—two different file objects are never equal, even if they represent the same disk file with the same position. Sometimes it’s about needing to be able to create distinct things—two Enum members from different classes are never equal even if they have the same name and value, and two Enum classes are never equal even if they have exactly the same members. (Imagine if you had code that did different things with ForegroundColor and BackgroundColor objects, and then they magically became the same type when you added bright background colors.)

Would you really want two different modules to be the same just because they happened to have the same contents, or happened to get those contents by executing the same source? You definitely wouldn’t want them to be identical (imagine if a.py and b.py were empty, you did `import a; import b; a.spam=2; b.spam=3`, and a.spam was now 3.) And I don’t think you’d want two non-identical modules to be equal.

But of course their contents wouldn’t be equal, because even a module built from an empty .py file has some default attributes, including __name__, which will be different between a and b.

(As a side note, modules don’t always have an absolute path. The most common way to get a module is as a .py file in a directory on sys.path, but there are other ways—cached .pyc flies delivered without the .py file, extension modules, modules inside zip archives, even modules that use arbitrary custom finders to pull them off a web server or out of a database, or simulate hierarchy on top of a flat filesystem, or whatever. Some of these have a confusing path, some have no path at all. They do always have a loader spec, which could theoretically serve the same purpose. But modules don’t remember the loader spec used to load them. Plus, loader specs, unlike names, are a low-level detail that most Python developers probably never learn.)

...

Andrew Barnert

11:46 p.m.

On Oct 29, 2019, at 15:05, Richard Vogel <mereep@gmx.net> wrote:

...

Nevertheless, could Python cache that spec and cache also a checksum of whats coming out after resolveing the loader ?

Sure. In addition to sys.modules mapping names to modules you could have a similar dict mapping specs to modules. It might require some minor changes to specs, and I’m only about 80% sure specs have exactly the info needed in the first place, but otherwise it seems like it would work. I don’t think you need the checksum for anything, but let’s come back to that; if you do, it’s obviously easy to add another dict, or change the first dict to map specs to (checksum, module) pairs, or whatever. If you’re not changing any existing behavior to rely on this, only adding new behavior, that sounds feasible to me. But if you are changing existing behavior in a way that relies on this, that would break backward compatibility. Most obviously, any existing code that modifies sys.modules assumes that it doesn’t have to modify anything else, so whatever it was trying to do will probably no longer work. And other things currently guaranteed by the import system and relied on by code would also probably break. So you’d need a really compelling reason to force people to change all that code over the next 3 versions.

...

So given that Python could recognize the case where you actually import something similiar with a different loader spec (for example the case where something in path is imported fully quallified and non fully qualified).

The point of the spec is that (I think) you’d get the same spec for importing spam.eggs and also importing eggs when they’re both the same file (or zip entry, or whatever). And Python assumes that it doesn’t have to deal with module source changing in the middle of a run, except when you go explicitly behind the importer’s back (e.g., with importlib.reload). So I think just the spec itself tells you whether two modules are “the same” in the relevant sense. So you don’t need a checksum, or anything else, there. Meanwhile, if two different modules happen to have identical checksums—say, because a.py and b.py are both empty files—they’re still different things. So I don’t think you _want_ a checksum either. Also, checksum of what? It’s obvious what to checksum for a module defined by a .py file (but you still have to work out how that gets cached with the .pyc files), and probably not hard for a .so file, but what’s the checksum of package directory? Or an extension module linked into the interpreter executable? At the very least you’d have to come up with a checksum function for each loader type (and require third-party loaders to do the same—breaking all existing ones).

...

Given that Python could spill out a warning that this happened (also in your circular case for example) and just do what it does now still.

This makes sense. If a spec is already in the new spec-to-module dict, but the module’s qualified name doesn’t match the name we’re trying to import it as, warn. I think this could be done in a single place in importlib and just work. The backward compatibility problem wouldn’t affect as many things, and a warning isn’t as bad as an error or different behavior—but still, I’m not sure you could convince people that the benefit is worth that cost. It’s worth looking at why similar warnings have been rejected for the circular __main__ issue in the past (which I don’t remember).

...

Then a import keyword extension like *import twin ~~~* would explicitly do what it does now anyways (Giving you the same thing in a different realization, so basically just removing the warning in that case) and *import union ~~~* which returns the same realization and removing the warning. This would force the user into knowing what they do and give the user the chance to see that such a thing happened and start thinking about what they actually wanted to do in the first place.

Well, new keywords need to jump a pretty high hurdle. Either this would break all code that used twin or union as ordinary identifiers (including the builtin set class), or they would have to be “contextual keywords” that are special only in specific grammar contexts, which have a lot of problems of their own. (For just one example, they make best-effort-pseudo-parser tools as used by IDEs and indexers and code coloring scripts a lot more complicated.) What if, instead of new syntax, this were just a pair or magic modules, where you write `import twin.spam` or `from twin import spam`? I’m not sure that actually covers all reasonable use cases. But if it does, I think this would be doable (although not easy for someone who doesn’t already know importlib pretty solidly). And, if so, it should be doable in a way that can be written as a third-party library and packaged on PyPI and probably work with all Python 3.4+. Then, if it gets lots of uptake from PyPI it would be a lot easier to argue for adding it to the stdlib. Plus, there’d be a ready-to-use backport for people who want to use the feature but don’t want to require Python 3.10. If it doesn’t cover everything, you could definitely implement spam = twin.import('spam'), but that’s obviously not nearly as nice to use.

1943

Age (days ago)

1946

Last active (days ago)

List overview

Download

23 comments

8 participants

participants (8)

Andrew Barnert
Brendan Barnwell
Ethan Furman
mereep＠gmx.net
Paul Moore
Richard Vogel
Serhiy Storchaka
Steven D'Aprano