PEP-499: "python -m foo" should bind to both "__main__" and "foo" in sys.modules

I was recently bitten by the fact that the command: python -m foo pulls in the module and attaches it as sys.modules['__main__'], but not to sys.modules['foo']. Should the program also: import foo it pulls in the same module code, but binds a completely independent separate instance of it to sys.modules['foo']. This is counter intuitive; it is a natural expectation that "python -m foo" imports "foo" in a normal fashion. If the program modifies items in "foo", those modifications are not effected in "__main__", since these are two distinct modules. I propose that "python -m foo" imports foo as normal, binding it to sys.modules["__main__"] as at present, but that it also binds the module to sys.modules["foo"]. This will remove the disconnect between "python -m foo" and a program's internal "import foo". For people who are concerned that the modules .__name__ is "__main__", note that the module's resolved "offical" name is present in .__spec__.name as described in PEP 451. There are two recent discussion threads on this in python-list at: https://mail.python.org/pipermail/python-list/2015-August/694905.html and in python-ideas at: https://mail.python.org/pipermail/python-ideas/2015-August/034947.html Please give them a read and give this PEP your thoughts. The raw text of the PEP is below. It feels uncontroversial to me, but then it would:-) It is visible on the web here: https://www.python.org/dev/peps/pep-0499/ and I've made a public repository to track the text as it evolves here: https://bitbucket.org/cameron_simpson/pep-0499/ Cheers, Cameron Simpson <cs@zip.com.au> PEP: 499 Title: ``python -m foo`` should bind ``sys.modules['foo']`` in addition to ``sys.modules['__main__']`` Version: $Revision$ Last-Modified: $Date$ Author: Cameron Simpson <cs@zip.com.au> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 07-Aug-2015 Python-Version: 3.6 Abstract ======== When a module is used as a main program on the Python command line, such as by: python -m module.name ... it is easy to accidentally end up with two independent instances of the module if that module is again imported within the program. This PEP proposes a way to fix this problem. When a module is invoked via Python's -m option the module is bound to ``sys.modules['__main__']`` and its ``.__name__`` attribute is set to ``'__main__'``. This enables the standard "main program" boilerplate code at the bottom of many modules, such as:: if __name__ == '__main__': sys.exit(main(sys.argv)) However, when the above command line invocation is used it is a natural inference to presume that the module is actually imported under its official name ``module.name``, and therefore that if the program again imports that name then it will obtain the same module instance. That actuality is that the module was imported only as ``'__main__'``. Another import will obtain a distinct module instance, which can lead to confusing bugs. Proposal ======== It is suggested that to fix this situation all that is needed is a simple change to the way the ``-m`` option is implemented: in addition to binding the module object to ``sys.modules['__main__']``, it is also bound to ``sys.modules['module.name']``. Nick Coghlan has suggested that this is as simple as modifying the ``runpy`` module's ``_run_module_as_main`` function as follows:: main_globals = sys.modules["__main__"].__dict__ to instead be:: main_module = sys.modules["__main__"] sys.modules[mod_spec.name] = main_module main_globals = main_module.__dict__ Considerations and Prerequisites ================================ Pickling Modules ---------------- Nick has mentioned `issue 19702`_ which proposes (quoted from the issue): - runpy will ensure that when __main__ is executed via the import system, it will also be aliased in sys.modules as __spec__.name - if __main__.__spec__ is set, pickle will use __spec__.name rather than __name__ to pickle classes, functions and methods defined in __main__ - multiprocessing is updated appropriately to skip creating __mp_main__ in child processes when __main__.__spec__ is set in the parent process The first point above covers this PEP's specific proposal. Background ========== `I tripped over this issue`_ while debugging a main program via a module which tried to monkey patch a named module, that being the main program module. Naturally, the monkey patching was ineffective as it imported the main module by name and thus patched the second module instance, not the running module instance. However, the problem has been around as long as the ``-m`` command line option and is encountered regularly, if infrequently, by others. In addition to `issue 19702`_, the discrepancy around `__main__` is alluded to in PEP 451 and a similar proposal (predating PEP 451) is described in PEP 395 under `Fixing dual imports of the main module`_. References ========== .. _issue 19702: http://bugs.python.org/issue19702 .. _I tripped over this issue: https://mail.python.org/pipermail/python-list/2015-August/694905.html .. _Fixing dual imports of the main module: https://www.python.org/dev/peps/pep-0395/#fixing-dual-imports-of-the-main-mo... Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:

On Sat, Aug 8, 2015 at 7:49 PM, Cameron Simpson <cs@zip.com.au> wrote:
The raw text of the PEP is below. It feels uncontroversial to me, but then it would:-)
I'm not sure that it'll be uncontroversial, but I agree with it :) The risk that I see (as I mentioned in the previous thread, but reiterating for those who just came in) is that it becomes possible to import something whose __name__ is not what you imported. Currently, you can "import math" and see that math.__name__ is "math", or "import urllib.parse" and, as you'd expect, urllib.parse.__name__ is "urllib.parse". In the few cases where it isn't exactly what you imported, it's the canonical name for it - for instance, os.path.__name__ is posixpath on my system. The change proposed here means that the canonical name for the module you're running as the main file is now "__main__", and not whatever else it would have been. Consequences for pickle/multiprocessing/Windows are mentioned in the PEP. Are there any other places where a module's name is checked? ChrisA

On 08Aug2015 20:30, Chris Angelico <rosuav@gmail.com> wrote:
I think I take the line that as of PEP 451 the conanical name for a module is .__spec__.name. The module's .__name__ normally matches that, but obviously in the case of "python -m" it does not. As you point out, suddenly a module can appear somewhere other than sys.modules['__main__'] where that difference shows. Let's ask the associated question: who introspects module.__name__ and expects it to be the cononical name? For what purpose? I'm of the opinion that those cases are few, and that they should in any case be updated to consult .__spec__.name these days (with, I suppose, fallback for older Python versions). I think that is the case even without the change suggested by PEP 499. Cheers, Cameron Simpson <cs@zip.com.au>

On Aug 8, 2015, at 16:18, Cameron Simpson <cs@zip.com.au> wrote:
I'd think the first place to look is code that deals directly with module objects and/or sys.modules--graphical debuggers, plugin frameworks, bridges (a la AppScript or PyObjC), etc. Especially since many of them want to retain compatibility with 3.3, if not 3.2, and to share as much code as possible with a 2.x version Of course you're probably right that there aren't too many such things, and they're also presumably written by people who know what they're doing and wouldn't have too much trouble adapting them for 3.6+ if needed.

On 09Aug2015 03:05, Joseph Jevnik <joejev@gmail.com> wrote:
Yes. Yes it does. I just did a quick test package named "testmod" via "python -m testmod" and: - __init__.py has the __name__ "testmod" - __main__.py has the __name__ "__main__" in both python 2.7 and python 3.4. Since my test script reports: % python3.4 -m testmod __init__.py: /Users/cameron/rc/python/testmod/__init__.py testmod __main__.py: /Users/cameron/rc/python/testmod/__main__.py __main__ % python2.7 -m testmod ('__init__.py:', '/Users/cameron/rc/python/testmod/__init__.pyc', 'testmod') ('__main__.py:', '/Users/cameron/rc/python/testmod/__main__.py', '__main__') would it be enough to say that this change should only apply if the module is not a package? I'll do some more fiddling to see exactly what happens in packages when I import pieces of them, too. Cheers, Cameron Simpson <cs@zip.com.au>

On 09Aug2015 20:34, Cameron Simpson <cs@zip.com.au> wrote:
I append the code for my testmod below, being an __init__.py and a __main__.py. A run shows: % python3.4 -m testmod __init__.py: /Users/cameron/rc/python/testmod/__init__.py testmod testmod __main__.py: /Users/cameron/rc/python/testmod/__main__.py __main__ testmod.__main__ __main__ <module 'testmod.__main__' from '/Users/cameron/rc/python/testmod/__main__.py'> testmod <module 'testmod' from '/Users/cameron/rc/python/testmod/__init__.py'> (4 lines, should your mailer fold the output.) It seems to me that Python already does the "right thing" for packages, and it is only non-package modules which need the change proposed by the PEP. Comments please? Code below. Cheers, Cameron Simpson <cs@zip.com.au> testmod/__init__.py: #!/usr/bin/python print('__init__.py:', __file__, __name__, __spec__.name) testmod/__main__.py: #!/usr/bin/python import pprint import sys print('__main__.py:', __file__, __name__, __spec__.name) for modname, mod in sorted(sys.modules.items()): rmod = repr(mod) if 'testmod' in modname or 'testmod' in rmod: print(modname, rmod)

On 09Aug2015 19:33, Joseph Jevnik <joejev@gmail.com> wrote:
Good point. Please see if this update states your issue fairly and addresses it: https://bitbucket.org/cameron_simpson/pep-0499/commits/3efcd9b54e238a1ff7f5c... Cheers, Cameron Simpson <cs@zip.com.au>

On 08Aug2015 22:12, Andrew Barnert <abarnert@yahoo.com> wrote:
One might hope. So I've started with the stdlib in two passes: looking for .__name__ associated with "mod", and looking for __main__ not in the standard boilerplate (__name__ == '__main__'). Obviously all this code is unfamiliar to me so anyone with deeper understanding who wants to look is most welcome. Pass 1 with this command: find . -type f -name \*.py | xargs fgrep .__name__ /dev/null | grep mod to look for module related code using .__name__. Of course a lot of it is reporting, but there are some interesting relevant bits. doctest: This refers to module.__name__ quite a lot. The _normalize_module() function uses __name__ instead of __spec__.name. _from_module() tests is an object is defined in a particular module based on __name__; I'm (naively) surprised that this can't use "is", but it looks like an object's __module__ attribute is a string, which I imagine avoids circular references. _get_test() uses __name__ instead of __spec__.name, though only as a fallback if there is no __file__. SkipDocTestCase.shortDescription() uses __name__. importlib: mostly seems fine according to my shallow understanding? inspect: getmodule() seems correct (uses __name__ but seems correctish) - this does seem to be a grope around in the available places looking for a match function, and feels unreliable anyway. modulefinder: this does look like it could use __spec__.name more widely, or as an adjunct to __name__. scan_code() looks like another "grope around" function trying to infer structure from the pieces sitting about:-) pdb: Pdb.do_whatis definitely reports using .__name__. Not necessarily incorrect. pkgutils: get_loader() uses .__name__, probably ougtht to be __spec__.name pydoc: also probably should upgrade to .__spec__.name unittest: TestLoader.discover seems to rely on __name__ instead of __spec__.name while constructing a pathname; definitely seems like it needs updating for PEP 451. It also looks up __name__ in sys.builtin_module_names to reject constructing a pathname. Pass 2 with this command: find . -type f -name \*.py |xxargs fgrep __main__ | grep -v 'if *__name__ *== *["'\'']__main__' looking for __main__ but discarding the boilerplate. I'm actually striking out here. Since this PEP doesn't change __name__ == '__main__' I've not found anything here that looks like it would stop working. Even runpy, surcory though my look at it is, is going forward: setting __name__ to '__main__' instead of working backwards. Further thoughts? Cheers, Cameron Simpson <cs@zip.com.au>

On Sat, Aug 8, 2015 at 7:49 PM, Cameron Simpson <cs@zip.com.au> wrote:
The raw text of the PEP is below. It feels uncontroversial to me, but then it would:-)
I'm not sure that it'll be uncontroversial, but I agree with it :) The risk that I see (as I mentioned in the previous thread, but reiterating for those who just came in) is that it becomes possible to import something whose __name__ is not what you imported. Currently, you can "import math" and see that math.__name__ is "math", or "import urllib.parse" and, as you'd expect, urllib.parse.__name__ is "urllib.parse". In the few cases where it isn't exactly what you imported, it's the canonical name for it - for instance, os.path.__name__ is posixpath on my system. The change proposed here means that the canonical name for the module you're running as the main file is now "__main__", and not whatever else it would have been. Consequences for pickle/multiprocessing/Windows are mentioned in the PEP. Are there any other places where a module's name is checked? ChrisA

On 08Aug2015 20:30, Chris Angelico <rosuav@gmail.com> wrote:
I think I take the line that as of PEP 451 the conanical name for a module is .__spec__.name. The module's .__name__ normally matches that, but obviously in the case of "python -m" it does not. As you point out, suddenly a module can appear somewhere other than sys.modules['__main__'] where that difference shows. Let's ask the associated question: who introspects module.__name__ and expects it to be the cononical name? For what purpose? I'm of the opinion that those cases are few, and that they should in any case be updated to consult .__spec__.name these days (with, I suppose, fallback for older Python versions). I think that is the case even without the change suggested by PEP 499. Cheers, Cameron Simpson <cs@zip.com.au>

On Aug 8, 2015, at 16:18, Cameron Simpson <cs@zip.com.au> wrote:
I'd think the first place to look is code that deals directly with module objects and/or sys.modules--graphical debuggers, plugin frameworks, bridges (a la AppScript or PyObjC), etc. Especially since many of them want to retain compatibility with 3.3, if not 3.2, and to share as much code as possible with a 2.x version Of course you're probably right that there aren't too many such things, and they're also presumably written by people who know what they're doing and wouldn't have too much trouble adapting them for 3.6+ if needed.

On 09Aug2015 03:05, Joseph Jevnik <joejev@gmail.com> wrote:
Yes. Yes it does. I just did a quick test package named "testmod" via "python -m testmod" and: - __init__.py has the __name__ "testmod" - __main__.py has the __name__ "__main__" in both python 2.7 and python 3.4. Since my test script reports: % python3.4 -m testmod __init__.py: /Users/cameron/rc/python/testmod/__init__.py testmod __main__.py: /Users/cameron/rc/python/testmod/__main__.py __main__ % python2.7 -m testmod ('__init__.py:', '/Users/cameron/rc/python/testmod/__init__.pyc', 'testmod') ('__main__.py:', '/Users/cameron/rc/python/testmod/__main__.py', '__main__') would it be enough to say that this change should only apply if the module is not a package? I'll do some more fiddling to see exactly what happens in packages when I import pieces of them, too. Cheers, Cameron Simpson <cs@zip.com.au>

On 09Aug2015 20:34, Cameron Simpson <cs@zip.com.au> wrote:
I append the code for my testmod below, being an __init__.py and a __main__.py. A run shows: % python3.4 -m testmod __init__.py: /Users/cameron/rc/python/testmod/__init__.py testmod testmod __main__.py: /Users/cameron/rc/python/testmod/__main__.py __main__ testmod.__main__ __main__ <module 'testmod.__main__' from '/Users/cameron/rc/python/testmod/__main__.py'> testmod <module 'testmod' from '/Users/cameron/rc/python/testmod/__init__.py'> (4 lines, should your mailer fold the output.) It seems to me that Python already does the "right thing" for packages, and it is only non-package modules which need the change proposed by the PEP. Comments please? Code below. Cheers, Cameron Simpson <cs@zip.com.au> testmod/__init__.py: #!/usr/bin/python print('__init__.py:', __file__, __name__, __spec__.name) testmod/__main__.py: #!/usr/bin/python import pprint import sys print('__main__.py:', __file__, __name__, __spec__.name) for modname, mod in sorted(sys.modules.items()): rmod = repr(mod) if 'testmod' in modname or 'testmod' in rmod: print(modname, rmod)

I would be okay if this change did not affect execution of a package with the python -m flag. I was only concerned because a __main__ in a package is common and wanted to make sure you had addressed it. On Sun, Aug 9, 2015 at 6:48 PM, Cameron Simpson <cs@zip.com.au> wrote:

On 09Aug2015 19:33, Joseph Jevnik <joejev@gmail.com> wrote:
Good point. Please see if this update states your issue fairly and addresses it: https://bitbucket.org/cameron_simpson/pep-0499/commits/3efcd9b54e238a1ff7f5c... Cheers, Cameron Simpson <cs@zip.com.au>

On 08Aug2015 22:12, Andrew Barnert <abarnert@yahoo.com> wrote:
One might hope. So I've started with the stdlib in two passes: looking for .__name__ associated with "mod", and looking for __main__ not in the standard boilerplate (__name__ == '__main__'). Obviously all this code is unfamiliar to me so anyone with deeper understanding who wants to look is most welcome. Pass 1 with this command: find . -type f -name \*.py | xargs fgrep .__name__ /dev/null | grep mod to look for module related code using .__name__. Of course a lot of it is reporting, but there are some interesting relevant bits. doctest: This refers to module.__name__ quite a lot. The _normalize_module() function uses __name__ instead of __spec__.name. _from_module() tests is an object is defined in a particular module based on __name__; I'm (naively) surprised that this can't use "is", but it looks like an object's __module__ attribute is a string, which I imagine avoids circular references. _get_test() uses __name__ instead of __spec__.name, though only as a fallback if there is no __file__. SkipDocTestCase.shortDescription() uses __name__. importlib: mostly seems fine according to my shallow understanding? inspect: getmodule() seems correct (uses __name__ but seems correctish) - this does seem to be a grope around in the available places looking for a match function, and feels unreliable anyway. modulefinder: this does look like it could use __spec__.name more widely, or as an adjunct to __name__. scan_code() looks like another "grope around" function trying to infer structure from the pieces sitting about:-) pdb: Pdb.do_whatis definitely reports using .__name__. Not necessarily incorrect. pkgutils: get_loader() uses .__name__, probably ougtht to be __spec__.name pydoc: also probably should upgrade to .__spec__.name unittest: TestLoader.discover seems to rely on __name__ instead of __spec__.name while constructing a pathname; definitely seems like it needs updating for PEP 451. It also looks up __name__ in sys.builtin_module_names to reject constructing a pathname. Pass 2 with this command: find . -type f -name \*.py |xxargs fgrep __main__ | grep -v 'if *__name__ *== *["'\'']__main__' looking for __main__ but discarding the boilerplate. I'm actually striking out here. Since this PEP doesn't change __name__ == '__main__' I've not found anything here that looks like it would stop working. Even runpy, surcory though my look at it is, is going forward: setting __name__ to '__main__' instead of working backwards. Further thoughts? Cheers, Cameron Simpson <cs@zip.com.au>
participants (4)
-
Andrew Barnert
-
Cameron Simpson
-
Chris Angelico
-
Joseph Jevnik