Disallow importing the same module under multiple names

The Python "double import" problem (discussed here <http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_t...>) leads to subtle bugs in code that change depending on what directory the code was run from. It's hard to think of reasons why importing the same filepath multiple times resulting in two copies of a module should be allowed, it almost always indicates a mistake in code. One possible solution I though of is to return the existing module upon import under a different name. Then you would still have two names for the same module, but at least it would be the same module object, not two copies of it loaded separately from the same file. However, then you would have the situation where a modules __name__ attribute would not match at least one of the names it was imported as. Instead, maybe a user should just get a big fat error if they try to import the same file twice under different names. For frozen modules, and other fancy imports, users should be able to do what they like, but for ordinary file-based imports, it's easy enough to detect if a module has been imported under two different names (by inspecting the filepath of the module being loaded and seeing if that filepath has previously been loaded under a different name) and throw an exception. I've recently made an addition to a project of mine <https://bitbucket.org/labscript_suite/labscript_utils/src/default/double_imp...> to turn this type of detection on all the time so that we don't miss these subtle bugs (we encountered some whilst porting to Python 3 since we started using absolute imports). I wonder if there's any reason something like this shouldn't be built into Python's default import system.

On Wed, Mar 14, 2018 at 04:20:20PM +1100, Chris Billington wrote:
Instead, maybe a user should just get a big fat error if they try to import the same file twice under different names.
Absolutely not. Suppose I import a library, Spam, which does "import numpy". Now I try to "import numpy as np", and I get an error. Besides, there is no reason why I shouldn't be able to import Spam import Spam as spam import Spam as Ham *even in the same script* if I so choose. It's not an error or a mistake to have multiple names for a single object.
I wonder if there's any reason something like this shouldn't be built into Python's default import system.
Wrong question. The question should not be "Why shouldn't we do this?" but "why should we do this?". -- Steve

On Wed, Mar 14, 2018 at 4:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
That's not the same thing. Both of those statements are importing the same file under the same name, "numpy"; one of them then assigns that to a different local name. But in sys.modules, they're the exact same thing. The double import problem comes when the same file gets imported under two different names *in sys.modules*. Everything else isn't a problem, because you get the same module object - if you "import numpy; import numpy as np; assert np is numpy", you're not seeing a double import problem. ChrisA

Exactly. The example in the "if __name__ == '__main__'" block of my module I linked imports numpy as np, and then adds the numpy package directory to sys.path and imports linalg. This is detected as numpy.linalg being imported twice, once as numpy.linalg, and once as linalg. The output error is below. As for "why should we do this", well, it helps prevent bugs that are pretty hard to notice and debug, so that's a plus. I can't think of any other pluses, so it's down to thinking of minuses in order to see if it's a good idea on-net. Traceback (most recent call last): File "double_import_denier.py", line 153, in _raise_error exec('raise RuntimeError(msg) from None') File "<string>", line 1, in <module> RuntimeError: Double import! The same file has been imported under two different names, resulting in two copies of the module. This is almost certainly a mistake. If you are running a script from within a package and want to import another submodule of that package, import it by its full path: 'import module.submodule' instead of just 'import submodule.' Path imported: /home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/ linalg Traceback (first time imported, as numpy.linalg): ------------ File "double_import_denier.py", line 195, in <module> test1() File "double_import_denier.py", line 185, in test1 import numpy as np File "/home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/__init__.py", line 142, in <module> from . import add_newdocs File "/home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module> from numpy.lib import add_newdoc File "/home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/lib/__init__.py", line 19, in <module> from .polynomial import * File "/home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/lib/polynomial.py", line 20, in <module> from numpy.linalg import eigvals, lstsq, inv ------------ Traceback (second time imported, as linalg): ------------ File "double_import_denier.py", line 195, in <module> test1() File "double_import_denier.py", line 188, in test1 import linalg ------------ On Wed, Mar 14, 2018 at 5:06 PM, Chris Angelico <rosuav@gmail.com> wrote:

On Wed, Mar 14, 2018 at 05:06:02PM +1100, Chris Angelico wrote:
Hence, two different names.
But in sys.modules, they're the exact same thing.
Of course they are. But it wasn't clear to me that the alternative was what Chris was referring to. Either I read carelessly, or he never mentioned anything about duplicate entries in sys.modules.
The double import problem comes when the same file gets imported under two different names *in sys.modules*.
It actually requires more than that to cause an actual problem. For starters, merely having two keys in sys.modules isn't a problem if they both refer to the same module object: sys.modules['maths'] = sys.modules['math'] is harmless. Even if you have imported two distinct copies of the same logical module as separate module objects -- which is hardly something you can do by accident, apart from one special case -- it won't necessarily be a problem. For instance, if the module consists of nothing but pure functions with no state, then the worst you have is a redundant copy and some wasted memory. It can even be useful, e.g. I have a module that uses global variables (I know, I know, "global variables considered harmful"...) and sometimes it is useful to import it twice as two independent copies. That's better than literally duplicating the .py file, and faster than re-writing the module and changing the scripts that rely on it. On Linux, I can make a hard-link of spam.py as ham.py. But if my file system doesn't support hard-linking, or I can't do that for some other reason, the next best thing is to intentionally subvert the import system and/or sys.modules in order to get two distinct copies. import spam sys.modules['ham'] = sys.modules['spam'] del sys.modules['spam'] import spam, ham will do it. (I don't know if there are any easier ways.) There's nothing wrong with doing this intentionally. Consenting adults and all that. So the question is, how can you do this *by accident*? The only way I know of to get a module *accidentally* imported twice is when a module imports itself in a script: # spam.py if __name__ == '__main__': import spam does not do what you expect. Now your script is loaded as two independent copies, once under 'spam' and once under '__main__'. The simple fix is: Unless you know what you are doing, don't have your runnable scripts import themselves when running. Apart from intentionally manipulating sys.modules or the import system, or playing file system tricks like hard-linking your files, under what circumstances can this occur by accident? -- Steve

The (perhaps minor) problem with simply having the same module object have two entries in sys.modules is that at least one of them will have a different __name__ attribute than the corresponding key in sys.modules. It's not the end of the world (os.path.__name__ is 'posixpath' on linux, not 'os.path', for example), but it could nonetheless trip some code up. Multiple copies of a module even when they are stateless can trip things up. The bug that bit us recently was that a function was checking if its arguments was an instance of a particular class defined in the same module as that function. However the argument had been instantiated using the other copy of the module, and hence as far as Python was concerned, the two classes were different classes and isinstance() returned False. It was very confusing. The problem occurs by accident primarily when you run scripts as __main__ from within a package directory, because the current working directory is in sys.path. I know I know, you're not supposed to do this. But this double import problem is exactly *why* you're not supposed to do this, and despite the advice people do it all the time: for example they might store scripts within a package directory to be run by calling code using subprocess.Popen, or manually by a developer to generate resources, or they might be in the process of turning their pile of scripts into a package, and encounter the problem during the transition whilst they are still running from within the directory. Or they might have had their program os.chdir() into the package directory in order to be able to use relative paths for resources the program needs to load. On Wed, Mar 14, 2018 at 10:18 PM, Steven D'Aprano <steve@pearwood.info> wrote:

On 2018-03-14 04:18, Steven D'Aprano wrote:
It can occur if a given directory winds up appearing twice on the import path. For instance, if /foo is on the path and /foo/bar is a package directory with /foo/bar/baz as a subpackage directory, then you can do "from bar import baz" and "import baz" and wind up with two different module objects referring to the same module. This usually happens when code starts adding paths to sys.path. This is in some sense "manipulating the import system" but it's something that a fair number of libraries do in various contexts, in order to be able to do things like import plugins without requiring the user to make those plugins available on the default import path. For what it's worth, I have been bitten by the problem a few times, although it's not very common. I think it's worth considering the proposal, but not sure if any change is justified given that the issue is fairly obscure. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Thu, Mar 15, 2018 at 5:25 AM, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
Or by running something that's part of a package, thus making the current directory automatically available. spam/ __init__.py spam.py ham.py python3 spam/spam.py If it now says "from spam import ham", is it going to get ham.py, or is it going to try to pull up ham from spam.py? ChrisA

On Wed, Mar 14, 2018 at 12:09:55PM -0700, Guido van Rossum wrote:
Yeah, one should never add a module to sys.path that has a __init__.py file.
Should import raise a warning in that case? I wouldn't want an outright error. I've cd'ed into a package directory in the shell, then run a python module from that directory too many times to want an exception, but I wouldn't mind a warning. -- Steve

On Tue, Mar 20, 2018 at 8:06 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Not to mention that plenty of programs are designed to run in whatever working directory they find themselves in, and that working directory may contain __init__.py files. For example, I wonder how mercurial gets around the fact that its own imports might be shadowed by whatever's in the current working directory. The mercurial project uses itself for version control, so it is presumably running with its working directory somewhere in its own source tree all the time. I wonder if mercurial removes the current working directory from sys.path to avoid any problems. A lot of programming tools no doubt often find themselves in working directories that are python packages. A warning would be pretty good! Especially if you could flip a switch to turn it into an error. Not if there is merely an __init__.py in view, but if you actually do an import twice, since a lot of code (with fully qualified imports, no submodules with names shadowing stdlib modules, etc) would never hit a problem with running from the package directory. It seems like running from within a package directory is bad news mostly *because* of the double import problem, and would be somewhat less of a bad idea if you could be confident you didn't have any accidental double imports (still something of a bad idea though because you can't know that your submodule isn't shadowing some other 3rd party module indirectly imported by your code, but that's about the only remaining issue with it I can think of). -Chris

I don't think that's true: On Wed, Mar 21, 2018 at 10:51 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I don't think that's true: $ cd /tmp $ echo 'import bar' > foo.py $ echo 'print("this is bar")' > bar.py $ python foo.py this is bar (/tmp is not in the python path) -Chris

On 21 March 2018 at 10:01, Chris Billington <chrisjbillington@gmail.com> wrote:
Scripts add the directory of the script, but the "-m" switch adds the current directory in order to locate modules and packages it can find there (although it's possible we'll attempt to figure out a way to change that in the future and require folks to explicitly opt-in to cwd relative main module imports: https://bugs.python.org/issue33053). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 20 March 2018 at 07:23, Chris Billington <chrisjbillington@gmail.com> wrote:
It seems like running from within a package directory is bad news mostly *because* of the double import problem, [...]
I think the main issue regarding loading a module from within a package directory is about the paths, non-absolute non-relative import names, and [mainly] a desire to keep a module+script behavior in a single-file, not really about double importing. If you have a package "a" with a subpackage "b" that has a module "c", its file "a/b/c.py" would be: - Externally/internally imported as "from a.b import c" - Internally imported as "from .b import c", for modules in the "a" package level - Internally imported as "from . import c", for modules in the "b" subpackage level - Internally imported as "from ..b import c", for modules in a subpackage level sibling to "b" IMHO, both internally and externally to the package, you should never do "from b import c". The 3 issues I see: 1. [sys.path trick] Doing "from b import c" from within "a" wouldn't load the "a/b/c.py" file unless you do some nonstandard sys.path trick to search for modules/packages in both the root directory (to import "a") and the "a" package-level directory (to import "b"). This scenario makes both "a.b" and "b" imports available, distinct import names to the same file. 2. [non-absolute non-relative name] Using "from b import c" would be the way to [externally] load this module from the "a" package-level directory as the current working directory if the root directory isn't in the sys.path. But, in such a scenario, the "from a.b import c" simply don't work. That's the misleading point: you can import some package internals with alternative names, while the package itself can't be imported. The current working directory or a patched sys.path is the "culprit" that enforces a distinct/specific naming for the imports, which are otherwise invalid. It's not relative (there's no leading dot), but I'd not say it's absolute (as there's no leading "a." where "a" is a package), it's perhaps a relative-importable-from-external-scripts naming style (or absolute-from-internals), though it has the "absolute import" syntax. 3. [module+script in a single internal file] One might wish to load a module as a script, using a `if __name__ == "__main__":` block to segregate the twofold behavior: one as a module, another as a script. That's fine when dealing with a file that doesn't belong to a package. But, inside a package structure, loading a single internal module as a script breaks the paths (and the import names). To avoid that, I [almost] always use relative imports, so the file can't be loaded as a script, and I write the script in another file, either in the root directory (level that has the package directory) or as a "package_name/__main__.py" file, importing the stuff it needs from any module using the same names one would use otherwere. That is, the solution is to split the file in two (an internal module file, and an external script file). Nevertheless, if the same file can be imported from two valid paths/addresses, should them be a single module? I mean, should "file" and "module" be 1-to-1 concepts, or should "address/name" and "module" be 1-to-1 concepts (or neither)? How about symbolic links? I'm not sure, but linking "absolute file name" to "module" sounds like endorsing the relative-importable-from-external-scripts naming style, and IMHO that's not the main issue. As packages with modules that internally imports themselves has to choose between relative and absolute import names, I don't see [2] as a real issue. -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap)

On 14 March 2018 at 15:20, Chris Billington <chrisjbillington@gmail.com> wrote:
I wonder if there's any reason something like this shouldn't be built into Python's default import system.
There are two main challenges with enforcing such a check, one affecting end users in general, one affecting standard library maintainers in particular: * the user facing problem is a backwards compatibility one: while double-imports usually aren't what people wanted, they're also typically fairly harmless. As a result, elevating them from "sometimes a source of obscure bugs" to "categorically prohibited" risks breaking currently working code. While a conventional deprecation cycle should be possible, it isn't clear whether or not the problem occurs frequently enough to warrant that effort. * the maintainer level problem is that we actually do this on purpose in the standard library's test suite in order to test both pure Python and C accelerated variants of various modules. That could be handled by being careful about exactly where the reverse lookup cache from filenames back to module names is checked and updated, but it does make the problem a bit trickier than just "maintain a reverse lookup table from filesystem paths to module names and complain if an import gets a hit in that table" I'm definitely sympathetic to the idea, though. If we did head in this direction, then we'd also need to accept & implement PEP 499 [1] (which proposes aliasing __main__ as __main__.__spec__.name in sys.modules when executed with "-m") to avoid causing problems. Cheers, Nick. [1] https://www.python.org/dev/peps/pep-0499/ -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Mar 15, 2018 at 3:26 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
I call all such code "working"... It is a bad design. They are not harmless for any module that initializes global state or defines exceptions or types and relies on catching those exceptions or types via isinstance checks when things created in one part of the program wind up used in another that refers to a different multiply imported copy of the module. This is even more painful for extension modules. Debugging these issues is hard. A cache by os.path.abspath would be a good thing. I'd personally prefer it to be an ImportError with a message explaining the problem. Including a pointer to the original successful import when the import under another sys.modules name is attempted. It'd lead to better code health. But others likely disagree and prefer to silently return the existing module.
I don't doubt that doing this would require a lot of code cleanup. :)

On 20 March 2018 at 16:25, Gregory P. Smith <greg@krypto.org> wrote:
I was recently reminded of a "fun" edge case for PEP 499: "python -m site", which reruns a module that gets implicitly imported at startup (so "site" is already in sys.modules by the time __main__ runs). That said, the way that currently works (re-running sitecustomize and usercustomize) isn't particularly wonderful, so proposing changing it would be reasonable. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Wed, Mar 14, 2018 at 04:20:20PM +1100, Chris Billington wrote:
Instead, maybe a user should just get a big fat error if they try to import the same file twice under different names.
Absolutely not. Suppose I import a library, Spam, which does "import numpy". Now I try to "import numpy as np", and I get an error. Besides, there is no reason why I shouldn't be able to import Spam import Spam as spam import Spam as Ham *even in the same script* if I so choose. It's not an error or a mistake to have multiple names for a single object.
I wonder if there's any reason something like this shouldn't be built into Python's default import system.
Wrong question. The question should not be "Why shouldn't we do this?" but "why should we do this?". -- Steve

On Wed, Mar 14, 2018 at 4:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
That's not the same thing. Both of those statements are importing the same file under the same name, "numpy"; one of them then assigns that to a different local name. But in sys.modules, they're the exact same thing. The double import problem comes when the same file gets imported under two different names *in sys.modules*. Everything else isn't a problem, because you get the same module object - if you "import numpy; import numpy as np; assert np is numpy", you're not seeing a double import problem. ChrisA

Exactly. The example in the "if __name__ == '__main__'" block of my module I linked imports numpy as np, and then adds the numpy package directory to sys.path and imports linalg. This is detected as numpy.linalg being imported twice, once as numpy.linalg, and once as linalg. The output error is below. As for "why should we do this", well, it helps prevent bugs that are pretty hard to notice and debug, so that's a plus. I can't think of any other pluses, so it's down to thinking of minuses in order to see if it's a good idea on-net. Traceback (most recent call last): File "double_import_denier.py", line 153, in _raise_error exec('raise RuntimeError(msg) from None') File "<string>", line 1, in <module> RuntimeError: Double import! The same file has been imported under two different names, resulting in two copies of the module. This is almost certainly a mistake. If you are running a script from within a package and want to import another submodule of that package, import it by its full path: 'import module.submodule' instead of just 'import submodule.' Path imported: /home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/ linalg Traceback (first time imported, as numpy.linalg): ------------ File "double_import_denier.py", line 195, in <module> test1() File "double_import_denier.py", line 185, in test1 import numpy as np File "/home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/__init__.py", line 142, in <module> from . import add_newdocs File "/home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/add_newdocs.py", line 13, in <module> from numpy.lib import add_newdoc File "/home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/lib/__init__.py", line 19, in <module> from .polynomial import * File "/home/bilbo/anaconda3/lib/python3.6/site-packages/numpy/lib/polynomial.py", line 20, in <module> from numpy.linalg import eigvals, lstsq, inv ------------ Traceback (second time imported, as linalg): ------------ File "double_import_denier.py", line 195, in <module> test1() File "double_import_denier.py", line 188, in test1 import linalg ------------ On Wed, Mar 14, 2018 at 5:06 PM, Chris Angelico <rosuav@gmail.com> wrote:

On Wed, Mar 14, 2018 at 05:06:02PM +1100, Chris Angelico wrote:
Hence, two different names.
But in sys.modules, they're the exact same thing.
Of course they are. But it wasn't clear to me that the alternative was what Chris was referring to. Either I read carelessly, or he never mentioned anything about duplicate entries in sys.modules.
The double import problem comes when the same file gets imported under two different names *in sys.modules*.
It actually requires more than that to cause an actual problem. For starters, merely having two keys in sys.modules isn't a problem if they both refer to the same module object: sys.modules['maths'] = sys.modules['math'] is harmless. Even if you have imported two distinct copies of the same logical module as separate module objects -- which is hardly something you can do by accident, apart from one special case -- it won't necessarily be a problem. For instance, if the module consists of nothing but pure functions with no state, then the worst you have is a redundant copy and some wasted memory. It can even be useful, e.g. I have a module that uses global variables (I know, I know, "global variables considered harmful"...) and sometimes it is useful to import it twice as two independent copies. That's better than literally duplicating the .py file, and faster than re-writing the module and changing the scripts that rely on it. On Linux, I can make a hard-link of spam.py as ham.py. But if my file system doesn't support hard-linking, or I can't do that for some other reason, the next best thing is to intentionally subvert the import system and/or sys.modules in order to get two distinct copies. import spam sys.modules['ham'] = sys.modules['spam'] del sys.modules['spam'] import spam, ham will do it. (I don't know if there are any easier ways.) There's nothing wrong with doing this intentionally. Consenting adults and all that. So the question is, how can you do this *by accident*? The only way I know of to get a module *accidentally* imported twice is when a module imports itself in a script: # spam.py if __name__ == '__main__': import spam does not do what you expect. Now your script is loaded as two independent copies, once under 'spam' and once under '__main__'. The simple fix is: Unless you know what you are doing, don't have your runnable scripts import themselves when running. Apart from intentionally manipulating sys.modules or the import system, or playing file system tricks like hard-linking your files, under what circumstances can this occur by accident? -- Steve

The (perhaps minor) problem with simply having the same module object have two entries in sys.modules is that at least one of them will have a different __name__ attribute than the corresponding key in sys.modules. It's not the end of the world (os.path.__name__ is 'posixpath' on linux, not 'os.path', for example), but it could nonetheless trip some code up. Multiple copies of a module even when they are stateless can trip things up. The bug that bit us recently was that a function was checking if its arguments was an instance of a particular class defined in the same module as that function. However the argument had been instantiated using the other copy of the module, and hence as far as Python was concerned, the two classes were different classes and isinstance() returned False. It was very confusing. The problem occurs by accident primarily when you run scripts as __main__ from within a package directory, because the current working directory is in sys.path. I know I know, you're not supposed to do this. But this double import problem is exactly *why* you're not supposed to do this, and despite the advice people do it all the time: for example they might store scripts within a package directory to be run by calling code using subprocess.Popen, or manually by a developer to generate resources, or they might be in the process of turning their pile of scripts into a package, and encounter the problem during the transition whilst they are still running from within the directory. Or they might have had their program os.chdir() into the package directory in order to be able to use relative paths for resources the program needs to load. On Wed, Mar 14, 2018 at 10:18 PM, Steven D'Aprano <steve@pearwood.info> wrote:

On 2018-03-14 04:18, Steven D'Aprano wrote:
It can occur if a given directory winds up appearing twice on the import path. For instance, if /foo is on the path and /foo/bar is a package directory with /foo/bar/baz as a subpackage directory, then you can do "from bar import baz" and "import baz" and wind up with two different module objects referring to the same module. This usually happens when code starts adding paths to sys.path. This is in some sense "manipulating the import system" but it's something that a fair number of libraries do in various contexts, in order to be able to do things like import plugins without requiring the user to make those plugins available on the default import path. For what it's worth, I have been bitten by the problem a few times, although it's not very common. I think it's worth considering the proposal, but not sure if any change is justified given that the issue is fairly obscure. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Thu, Mar 15, 2018 at 5:25 AM, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
Or by running something that's part of a package, thus making the current directory automatically available. spam/ __init__.py spam.py ham.py python3 spam/spam.py If it now says "from spam import ham", is it going to get ham.py, or is it going to try to pull up ham from spam.py? ChrisA

On Wed, Mar 14, 2018 at 12:09:55PM -0700, Guido van Rossum wrote:
Yeah, one should never add a module to sys.path that has a __init__.py file.
Should import raise a warning in that case? I wouldn't want an outright error. I've cd'ed into a package directory in the shell, then run a python module from that directory too many times to want an exception, but I wouldn't mind a warning. -- Steve

On Tue, Mar 20, 2018 at 8:06 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Not to mention that plenty of programs are designed to run in whatever working directory they find themselves in, and that working directory may contain __init__.py files. For example, I wonder how mercurial gets around the fact that its own imports might be shadowed by whatever's in the current working directory. The mercurial project uses itself for version control, so it is presumably running with its working directory somewhere in its own source tree all the time. I wonder if mercurial removes the current working directory from sys.path to avoid any problems. A lot of programming tools no doubt often find themselves in working directories that are python packages. A warning would be pretty good! Especially if you could flip a switch to turn it into an error. Not if there is merely an __init__.py in view, but if you actually do an import twice, since a lot of code (with fully qualified imports, no submodules with names shadowing stdlib modules, etc) would never hit a problem with running from the package directory. It seems like running from within a package directory is bad news mostly *because* of the double import problem, and would be somewhat less of a bad idea if you could be confident you didn't have any accidental double imports (still something of a bad idea though because you can't know that your submodule isn't shadowing some other 3rd party module indirectly imported by your code, but that's about the only remaining issue with it I can think of). -Chris

I don't think that's true: On Wed, Mar 21, 2018 at 10:51 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I don't think that's true: $ cd /tmp $ echo 'import bar' > foo.py $ echo 'print("this is bar")' > bar.py $ python foo.py this is bar (/tmp is not in the python path) -Chris

On 21 March 2018 at 10:01, Chris Billington <chrisjbillington@gmail.com> wrote:
Scripts add the directory of the script, but the "-m" switch adds the current directory in order to locate modules and packages it can find there (although it's possible we'll attempt to figure out a way to change that in the future and require folks to explicitly opt-in to cwd relative main module imports: https://bugs.python.org/issue33053). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 20 March 2018 at 07:23, Chris Billington <chrisjbillington@gmail.com> wrote:
It seems like running from within a package directory is bad news mostly *because* of the double import problem, [...]
I think the main issue regarding loading a module from within a package directory is about the paths, non-absolute non-relative import names, and [mainly] a desire to keep a module+script behavior in a single-file, not really about double importing. If you have a package "a" with a subpackage "b" that has a module "c", its file "a/b/c.py" would be: - Externally/internally imported as "from a.b import c" - Internally imported as "from .b import c", for modules in the "a" package level - Internally imported as "from . import c", for modules in the "b" subpackage level - Internally imported as "from ..b import c", for modules in a subpackage level sibling to "b" IMHO, both internally and externally to the package, you should never do "from b import c". The 3 issues I see: 1. [sys.path trick] Doing "from b import c" from within "a" wouldn't load the "a/b/c.py" file unless you do some nonstandard sys.path trick to search for modules/packages in both the root directory (to import "a") and the "a" package-level directory (to import "b"). This scenario makes both "a.b" and "b" imports available, distinct import names to the same file. 2. [non-absolute non-relative name] Using "from b import c" would be the way to [externally] load this module from the "a" package-level directory as the current working directory if the root directory isn't in the sys.path. But, in such a scenario, the "from a.b import c" simply don't work. That's the misleading point: you can import some package internals with alternative names, while the package itself can't be imported. The current working directory or a patched sys.path is the "culprit" that enforces a distinct/specific naming for the imports, which are otherwise invalid. It's not relative (there's no leading dot), but I'd not say it's absolute (as there's no leading "a." where "a" is a package), it's perhaps a relative-importable-from-external-scripts naming style (or absolute-from-internals), though it has the "absolute import" syntax. 3. [module+script in a single internal file] One might wish to load a module as a script, using a `if __name__ == "__main__":` block to segregate the twofold behavior: one as a module, another as a script. That's fine when dealing with a file that doesn't belong to a package. But, inside a package structure, loading a single internal module as a script breaks the paths (and the import names). To avoid that, I [almost] always use relative imports, so the file can't be loaded as a script, and I write the script in another file, either in the root directory (level that has the package directory) or as a "package_name/__main__.py" file, importing the stuff it needs from any module using the same names one would use otherwere. That is, the solution is to split the file in two (an internal module file, and an external script file). Nevertheless, if the same file can be imported from two valid paths/addresses, should them be a single module? I mean, should "file" and "module" be 1-to-1 concepts, or should "address/name" and "module" be 1-to-1 concepts (or neither)? How about symbolic links? I'm not sure, but linking "absolute file name" to "module" sounds like endorsing the relative-importable-from-external-scripts naming style, and IMHO that's not the main issue. As packages with modules that internally imports themselves has to choose between relative and absolute import names, I don't see [2] as a real issue. -- Danilo J. S. Bellini --------------- "*It is not our business to set up prohibitions, but to arrive at conventions.*" (R. Carnap)

On 14 March 2018 at 15:20, Chris Billington <chrisjbillington@gmail.com> wrote:
I wonder if there's any reason something like this shouldn't be built into Python's default import system.
There are two main challenges with enforcing such a check, one affecting end users in general, one affecting standard library maintainers in particular: * the user facing problem is a backwards compatibility one: while double-imports usually aren't what people wanted, they're also typically fairly harmless. As a result, elevating them from "sometimes a source of obscure bugs" to "categorically prohibited" risks breaking currently working code. While a conventional deprecation cycle should be possible, it isn't clear whether or not the problem occurs frequently enough to warrant that effort. * the maintainer level problem is that we actually do this on purpose in the standard library's test suite in order to test both pure Python and C accelerated variants of various modules. That could be handled by being careful about exactly where the reverse lookup cache from filenames back to module names is checked and updated, but it does make the problem a bit trickier than just "maintain a reverse lookup table from filesystem paths to module names and complain if an import gets a hit in that table" I'm definitely sympathetic to the idea, though. If we did head in this direction, then we'd also need to accept & implement PEP 499 [1] (which proposes aliasing __main__ as __main__.__spec__.name in sys.modules when executed with "-m") to avoid causing problems. Cheers, Nick. [1] https://www.python.org/dev/peps/pep-0499/ -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Thu, Mar 15, 2018 at 3:26 AM Nick Coghlan <ncoghlan@gmail.com> wrote:
I call all such code "working"... It is a bad design. They are not harmless for any module that initializes global state or defines exceptions or types and relies on catching those exceptions or types via isinstance checks when things created in one part of the program wind up used in another that refers to a different multiply imported copy of the module. This is even more painful for extension modules. Debugging these issues is hard. A cache by os.path.abspath would be a good thing. I'd personally prefer it to be an ImportError with a message explaining the problem. Including a pointer to the original successful import when the import under another sys.modules name is attempted. It'd lead to better code health. But others likely disagree and prefer to silently return the existing module.
I don't doubt that doing this would require a lot of code cleanup. :)

On 20 March 2018 at 16:25, Gregory P. Smith <greg@krypto.org> wrote:
I was recently reminded of a "fun" edge case for PEP 499: "python -m site", which reruns a module that gets implicitly imported at startup (so "site" is already in sys.modules by the time __main__ runs). That said, the way that currently works (re-running sitecustomize and usercustomize) isn't particularly wonderful, so proposing changing it would be reasonable. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
participants (9)
-
Brendan Barnwell
-
Chris Angelico
-
Chris Billington
-
Danilo J. S. Bellini
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
Nick Coghlan
-
Steven D'Aprano