proposal: "python -m foo" should bind sys.modules['foo']
Hello all, This is a writeup of a proposal I floated here: https://mail.python.org/pipermail/python-list/2015-August/694905.html last Sunday. If the response is positive I wish to write a PEP. Briefly, it is a natural expectation in users that the command: python -m module_name ... used to invoke modules in "main program" mode on the command line imported the module as "module_name". It does not, it imports it as "__main__". An import within the program of "module_name" makes a new instance of the module, which causes cognitive dissonance and has the side effect that now the program has two instances of the module. What I propose is that the above command line _should_ bind sys.modules['module_name'] as well as binding '__main__' as it does currently. I'm proposing that the python -m option have this effect (python pseudocode): % python -m module.name ... runs: # pseudocode, with values hardwired for clarity import sys M = new_empty_module(name='__main__', qualname='module.name') sys.modules['__main__'] = M sys.modules['module.name'] = M # load the module code from wherever (not necessarily a file - CPython # already must do this phase) M.execfile('/path/to/module/name.py') Specificly, this would have the following two changes to current practice: 1) the module is imported _once_, and bound to both its canonical name and also to __main__. 2) imported modules acquire a new attribute __qualname__ (analogous to the recent __qualname__ on functions). This is always the conanoical name of the module as resolved by the importer. For most modules __name__ will be the same as __qualname__, but for the "main" module __name__ will be '__main__'. This change has the following advantages: The current standard boilerplate: if __name__ == '__main__': ... invoke "main program" here ... continues to work unchanged. Importantly, if the program then issues "import module_name", it is already there and the existing instance is found and used. The thread referenced above outlines my most recent encounter with this and the trouble it caused me. Followup messages include some support for this proposed change, and some criticism. The critiquing article included some workarounds for this multiple module situation, but they were (1) somewhat dependent on modules coming from a file pathname and (2) cumbersome and require every end user to adopt these changes if affected by the situation. I'd like to avoid that. Cheers, Cameron Simpson <cs@zip.com.au> The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. - George Bernard Shaw
On Aug 4, 2015 8:30 PM, "Cameron Simpson" <cs@zip.com.au> wrote:
Hello all,
This is a writeup of a proposal I floated here: https://mail.python.org/pipermail/python-list/2015-August/694905.html last Sunday. If the response is positive I wish to write a PEP.
Be sure to read through PEP 495. -eric
On 04Aug2015 23:01, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Aug 4, 2015 8:30 PM, "Cameron Simpson" <cs@zip.com.au> wrote:
This is a writeup of a proposal I floated here: https://mail.python.org/pipermail/python-list/2015-August/694905.html last Sunday. If the response is positive I wish to write a PEP.
Be sure to read through PEP 495.
Hmm. Done: http://legacy.python.org/dev/peps/pep-0495/ Was there something specific I should have been looking for in there? Cheers, Cameron Simpson <cs@zip.com.au> Raw data, like raw sewage, needs some processing before it can be spread around. The opposite is true of theories. - James A. Carr <jac@scri.fsu.edu>
On 05Aug2015 02:14, Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Aug 4, 2015 11:46 PM, "Cameron Simpson" <cs@zip.com.au> wrote:
On 04Aug2015 23:01, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Be sure to read through PEP 495.
Sorry, I meant 395.
Ah, ok, many thanks. I've now read this, particularly this section: http://legacy.python.org/dev/peps/pep-0395/#fixing-pickling-without-breaking... I see that Guido has lent Nick the time machine, as that section outlines a scheme almost word for word what I propose. Though not quite. I see that this was withdrawn, and I after reading the whole PEP and the withdrawal statement at the top I think there are two probalems with the PEP. One is that, as stated, several of these issues have since been addressed elsewhere (though not my issue). The other is that it tried to address a whole host of issues which are related more by sharing the import system than necessarily being closely related of themselves, though clearly there are several scenarios that need considering to ensure that one fix doesn't make other things worse. I still wish to put forth my proposal on its own, probably PEPed, for the following reasons: (a) at present the multiple import via __main__/"python -m" is still not fixed (b) that the fix here: http://legacy.python.org/dev/peps/pep-0395/#fixing-dual-imports-of-the-main-... seems more oriented around keeping sys.path sane than directly avoiding a dual import (c) my suggestion both reuses __qualname__ proposal almost as PEP495 suggested (d) can't break anything because modules do not presently have a __qualname__ (e) would automatically remove a very surprising edge case that is very easy to trip over i.e. by doing nothing very weird, just plain old imports. Therefore I'd still like commentry on my quite limited and small proposal, with an eye to PEPing it and actually getting it approved. Cheers, Cameron Simpson <cs@zip.com.au> Yesterday, I was running a CNC plasma cutter that's controlled by Windows XP. This is a machine that moves around a plasma torch that cuts thick steel plate. �A "New Java update is available" window popped up while I was working. �Not good. - John Nagle
On 5 August 2015 at 19:01, Cameron Simpson <cs@zip.com.au> wrote:
Ah, ok, many thanks. I've now read this, particularly this section:
http://legacy.python.org/dev/peps/pep-0395/#fixing-pickling-without-breaking...
I see that Guido has lent Nick the time machine, as that section outlines a scheme almost word for word what I propose. Though not quite.
I see that this was withdrawn, and I after reading the whole PEP and the withdrawal statement at the top I think there are two probalems with the PEP. One is that, as stated, several of these issues have since been addressed elsewhere (though not my issue). The other is that it tried to address a whole host of issues which are related more by sharing the import system than necessarily being closely related of themselves, though clearly there are several scenarios that need considering to ensure that one fix doesn't make other things worse.
Right, the withdrawal was of that *specific* PEP, since it hadn't aged well, and covered various things that could be tackled independently.
I still wish to put forth my proposal on its own, probably PEPed, for the following reasons:
(a) at present the multiple import via __main__/"python -m" is still not fixed
(b) that the fix here:
http://legacy.python.org/dev/peps/pep-0395/#fixing-dual-imports-of-the-main-...
seems more oriented around keeping sys.path sane than directly avoiding a dual import
(c) my suggestion both reuses __qualname__ proposal almost as PEP495 suggested
(d) can't break anything because modules do not presently have a __qualname__
From an *implementation* perspective, you'll want to look at Eric's own PEP 451: https://www.python.org/dev/peps/pep-0451/
While I mentioned it in relation to pickle compatibility in the PEP 395 withdrawal notice, it's also relevant to reducing the risk of falling into the double import trap. In particular, __spec__.name already holds the additional state we need to fix this behaviour (i.e. the original module name), I just haven't found the opportunity to go back and update runpy to take advantage of PEP 451 to address this and other limitations. It would definitely be good to have a PEP addressing that. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 06Aug2015 00:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 5 August 2015 at 19:01, Cameron Simpson <cs@zip.com.au> wrote:
Ah, ok, many thanks. I've now read this, particularly this section: http://legacy.python.org/dev/peps/pep-0395/#fixing-pickling-without-breaking... [...] From an *implementation* perspective, you'll want to look at Eric's own PEP 451: https://www.python.org/dev/peps/pep-0451/
Ah. Yes. Thanks. On that basis I withdraw my .__qualname__ suggestion because there exists module.__spec__.name. So it now reduces the proposal to making the -m option do this: % python -m module.name ... runs (framed loosely like https://www.python.org/dev/peps/pep-0451/#how-loading-will-work): # pseudocode, with values hardwired for clarity import sys module = ModuleType('module.name') module.__name__ = '__main__' sys.modules['__main__'] = module sys.modules['module.name'] = module ... load module code ... I suspect "How Reloading Will Work" would need to track both module.__name__ and module.__spec__.name to reattach the module to both entires in sys.modules.
In particular, __spec__.name already holds the additional state we need to fix this behaviour (i.e. the original module name), I just haven't found the opportunity to go back and update runpy to take advantage of PEP 451 to address this and other limitations. It would definitely be good to have a PEP addressing that.
I'd like to have a go at addressing just the change I outline above, in the interests of just getting it done. Is that too narrow a change or PEP topic? Are there specific other things I should be considering/addressing that might be affected by my suggestion? Also, where do I find to source for runpy to preruse? Cheers, Cameron Simpson <cs@zip.com.au> A program in conformance will not tend to stay in conformance, because even if it doesn't change, the standard will. - Norman Diamond <diamond@jit.dec.com>
On 6 August 2015 at 10:07, Cameron Simpson <cs@zip.com.au> wrote:
I suspect "How Reloading Will Work" would need to track both module.__name__ and module.__spec__.name to reattach the module to both entires in sys.modules.
Conveniently, the fact that reloading rewrites the global namespace of the existing module, rather than creating the new module, means that the dual references won't create any new problems relating to multiple references - we already hit those issues due to the fact that modules refer directly to each from their module namespaces.
I'd like to have a go at addressing just the change I outline above, in the interests of just getting it done. Is that too narrow a change or PEP topic?
PEPs can be used for quite small things if we want to check for edge cases, and the interaction between __main__ and the rest of the import system is a truly fine source of those :)
Are there specific other things I should be considering/addressing that might be affected by my suggestion?
Using __spec__.name for pickling: http://bugs.python.org/issue19702 Proposed runpy refactoring to reduce the special casing for __main__: http://bugs.python.org/issue19982
Also, where do I find to source for runpy to preruse?
It's a standard library module: https://hg.python.org/cpython/file/default/Lib/runpy.py "_run_module_as_main" is the module level function that powers the "-m" switch. Actually *implementing* this change should be as simple as changing the line: main_globals = sys.modules["__main__"].__dict__ to instead be: main_module = sys.modules["__main__"] sys.modules[mod_spec.name] = main_module main_globals = main_module.__dict__ The PEP is mainly useful to more widely *advertise* the semantic change, since having the module start being accessible under both names has the potential to cause problems. In particular, I'll upgrade the pickle issue to something that *needs* to be addressed before this change can be made, as there will be programs that are working today because they'll be dual importing the main module, and then pickling objects from the properly imported one, which then unpickle correctly in other processes (even if __main__ is different). Preventing the dual import without also fixing the pickle compatibility issue when pickling __main__ objects would thus have the potential to break currently working code. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Trying to get back to speed with PEP-499... On 06Aug2015 13:26, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 6 August 2015 at 10:07, Cameron Simpson <cs@zip.com.au> wrote:
I suspect "How Reloading Will Work" would need to track both module.__name__ and module.__spec__.name to reattach the module to both entires in sys.modules.
Conveniently, the fact that reloading rewrites the global namespace of the existing module, rather than creating the new module, means that the dual references won't create any new problems relating to multiple references - we already hit those issues due to the fact that modules refer directly to each from their module namespaces. [...]
Also, where do I find to source for runpy to preruse?
It's a standard library module: https://hg.python.org/cpython/file/default/Lib/runpy.py
"_run_module_as_main" is the module level function that powers the "-m" switch.
Actually *implementing* this change should be as simple as changing the line:
main_globals = sys.modules["__main__"].__dict__
to instead be:
main_module = sys.modules["__main__"] sys.modules[mod_spec.name] = main_module main_globals = main_module.__dict__
I'd just like to check that my thinking is correct here. The above looks very easy, but Joseph Jevnik pointed out that packages already do this correctly (and slightly differently, as __main__ is the main module and __init__ is what is in sys.modules): https://bitbucket.org/cameron_simpson/pep-0499/commits/3efcd9b54e238a1ff7f5c... I'm about to try this: [~/s/cpython-pep499(hg)]fleet*> hg diff diff --git a/Lib/runpy.py b/Lib/runpy.py --- a/Lib/runpy.py +++ b/Lib/runpy.py @@ -186,7 +186,10 @@ def _run_module_as_main(mod_name, alter_ except _Error as exc: msg = "%s: %s" % (sys.executable, exc) sys.exit(msg) - main_globals = sys.modules["__main__"].__dict__ + main_module = sys.modules["__main__"] + if not main_module.is_package(mod_spec.name): + sys.modules[mod_spec.name] = main_module + main_globals = main_module.__dict__ if alter_argv: sys.argv[0] = mod_spec.origin return _run_code(code, main_globals, None, locally. Does this seem sound? Cheers, Cameron Simpson <cs@zip.com.au>
Hello, On Wed, 5 Aug 2015 02:14:55 -0600 Eric Snow <ericsnowcurrently@gmail.com> wrote:
On Aug 4, 2015 11:46 PM, "Cameron Simpson" <cs@zip.com.au> wrote:
On 04Aug2015 23:01, Eric Snow <ericsnowcurrently@gmail.com> wrote:
Be sure to read through PEP 495.
Sorry, I meant 395.
I'm sorry for possibly hijacking this thread, but it touches very much issue I had on my mind for a while: being able to run modules inside package as normal scripts. As this thread already has few people knowledgeable of peculiarities of package imports, perhaps they can suggest something. Scenario: There's a directory ("pkg" (representing Python namespace package)), and inside it, there's bar.py of not relevant content and foo.py with "from . import bar". What I'd like to do is (while inside "pkg" directory): python3 foo.py Current behavior: SystemError: Parent module '' not loaded, cannot perform relative import Expected behavior: "from . import bar" in foo.py is successful.
-eric
-- Best regards, Paul mailto:pmiscml@gmail.com
Hello, On Wed, 5 Aug 2015 18:23:20 +0300 Paul Sokolovsky <pmiscml@gmail.com> wrote:
I'm sorry for possibly hijacking this thread, but it touches very much issue I had on my mind for a while: being able to run modules inside package as normal scripts. As this thread already has few people knowledgeable of peculiarities of package imports, perhaps they can suggest something.
Scenario:
There's a directory ("pkg" (representing Python namespace package)), and inside it, there's bar.py of not relevant content and foo.py with "from . import bar". What I'd like to do is (while inside "pkg" directory):
python3 foo.py
Current behavior:
SystemError: Parent module '' not loaded, cannot perform relative import
Expected behavior:
"from . import bar" in foo.py is successful.
Perhaps I was asking something dumb, or everyone just takes for granted that nowadays Python code can't be developed comfortably without IDE, or 2+ console windows open, or something. But I find it quite a sign of problem, because if one accepts that one can't just run a script right away, but needs to do extra steps, then well, that enterprisey niche is pretty crowded and there're more choices how to make it more complicated. So, I did my homework (beyond just googling, which unfortunately didn't turn up much), and being able to do it with a simple "loader" and "single command line switch" like: python3 -mruninpkg script.py arg1 arg2 arg3 restored my piece of mind. The script is here: https://github.com/pfalcon/py-runinpkg . Hope someone else will find its existence insightful, or maybe someone will suggest how to make it better (I see the bottleneck in that it's not possible to make mod.__name__ an empty string, and that's what would be needed here to avoid double-import problem). I actually did another googling session, and there's definitely a niche for such solution, like this 10-years old article shows: http://code.activestate.com/recipes/307772-executing-modules-inside-packages... If only there were good, widely known inventory of them for different usecases... -- Best regards, Paul mailto:pmiscml@gmail.com
Paul Sokolovsky <pmiscml@gmail.com> writes:
I find it quite a sign of problem, because if one accepts that one can't just run a script right away, but needs to do extra steps, then well, that enterprisey niche is pretty crowded and there're more choices how to make it more complicated.
Python's BDFL has spoken of running modules with relative import as top-level scripts: I'm -1 on this and on any other proposed twiddlings of the __main__ machinery. The only use case seems to be running scripts that happen to be living inside a module's directory, which I've always seen as an antipattern. To make me change my mind you'd have to convince me that it isn't. <URL:https://mail.python.org/pipermail/python-3000/2007-April/006793.html> He doesn't describe (that I can find) what makes him think it's an antipattern, so I'm not clear on how he expects to be convinced it's a valid pattern. Nonetheless, that appears to be the hurdle you'd need to confront. -- \ “Skepticism is the highest duty and blind faith the one | `\ unpardonable sin.” —Thomas Henry Huxley, _Essays on | _o__) Controversial Questions_, 1889 | Ben Finney
On 6 August 2015 at 09:47, Ben Finney <ben+python@benfinney.id.au> wrote:
Paul Sokolovsky <pmiscml@gmail.com> writes:
I find it quite a sign of problem, because if one accepts that one can't just run a script right away, but needs to do extra steps, then well, that enterprisey niche is pretty crowded and there're more choices how to make it more complicated.
Python's BDFL has spoken of running modules with relative import as top-level scripts:
I'm -1 on this and on any other proposed twiddlings of the __main__ machinery. The only use case seems to be running scripts that happen to be living inside a module's directory, which I've always seen as an antipattern. To make me change my mind you'd have to convince me that it isn't.
<URL:https://mail.python.org/pipermail/python-3000/2007-April/006793.html>
He doesn't describe (that I can find) what makes him think it's an antipattern, so I'm not clear on how he expects to be convinced it's a valid pattern.
It's an anti-pattern because doing it fundamentally confuses the import system's internal state: https://www.python.org/dev/peps/pep-0395/#why-are-my-imports-broken Relative imports from the main module just happen to be a situation where the failure is an obvious one rather than subtle state corruption.
Nonetheless, that appears to be the hurdle you'd need to confront.
This came up more recently during the PEP 420 discussions, when the requirement to include __init__.py to explicitly mark package directories was eliminated. This means there's no longer any way for the interpreter to reliably infer from the filesystem layout precisely where in the module hierarchy you intended a module to live. See https://www.python.org/dev/peps/pep-0420/#discussion for references. However, one of the subproposals from PEP 395 still offers a potential fix: https://www.python.org/dev/peps/pep-0395/#id24 That proposes to allow explicit relative imports at the command line, such that Paul's example could be correctly invoked as: python3 -m ..pkg.foo It would also be possible to provide a syntactic shorthand for submodules of top level packages: python3 -m .foo The key here is that the interpreter is being explicitly told that the current directory is inside a package, as well as how far down in the package hierarchy it lives, and can adjust the way it sets sys.path[0] accordingly before proceeding on to import "pkg.foo" as __main__. That should be a relatively uncomplicated addition to runpy._run_module_as_main that could be rolled into Cameron's PEP plans. Steps required: * count leading dots in the supplied mod_name * remove "leading_dots-1" trailing directory names from sys.path[0] * strip the leading dots from mod_name before continuing with the rest of the function * in the special case of only 1 leading dot, remove the final directory segment from sys.path[0] and prepend it to mod_name with a dot separator Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
Hello, On Thu, 6 Aug 2015 13:57:19 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote: []
"I'm -1 on this and on any other proposed twiddlings of the __main__ machinery. The only use case seems to be running scripts that happen to be living inside a module's directory, which I've always seen as an antipattern. To make me change my mind you'd have to convince me that it isn't."
<URL:https://mail.python.org/pipermail/python-3000/2007-April/006793.html>
He doesn't describe (that I can find) what makes him think it's an antipattern, so I'm not clear on how he expects to be convinced it's a valid pattern.
While Nick's PEP-0395 lists enough things in Python import system which may confuse casual users (and which thus should raise alarm for all Python advocates, who think it's nice, easy-to-use language), I'd like to elaborate on my particular usecase. So, when you start a new "python library", you probably start it as a single-file module. And it's easy to play with it for both you and your users - just make another file in the same dir as your module, add "import my_module" to it, voila. At some point, you may decide that library is too big for a single file, and needs splitting. Which of course means converting a module to a package. And of course, when you import "utils", you want to be sure it imports your utils, not something else, which means using relative imports. But then suddenly, you no longer can drop your test scripts in the same directory where code is (like you did it before), but need to drop it level up, which may be not always convenient. And it would be one thing if it required extra step to run scripts located inside package dir (casual-user-in-me hunch feeling would be that PYTHONPATH needs to be set to ..), but we talk about not being able to do it at all.
It's an anti-pattern because doing it fundamentally confuses the import system's internal state: https://www.python.org/dev/peps/pep-0395/#why-are-my-imports-broken
Excellent PEP, Nick, it took some time to read thru it and references, but it answered all my questions. After reading it, it's hard to disagree that namespace packages support, a simplification and clarification in itself, conflicted and blocked an automagic way to resolve imports confusion for the user. But then indeed a logical solution is to give user's power to explicitly resolve this issue, if implicit no longer can work - by letting -m accept relative module paths, as you show below. It's also a discovery for me that -m's functionality appears to be handled largely on stdlib side using runpy module, and that's the main purpose of that module. I'll look into prototyping relative import support when I have time. And Nick, if you count votes for reviving PEP-395, +1 for that, IMHO, it's much worthy work than e.g. yet another (3rd!) string templating variant (still adhoc and limited, e.g. not supporting control statements).
Relative imports from the main module just happen to be a situation where the failure is an obvious one rather than subtle state corruption.
Nonetheless, that appears to be the hurdle you'd need to confront.
This came up more recently during the PEP 420 discussions, when the requirement to include __init__.py to explicitly mark package directories was eliminated. This means there's no longer any way for the interpreter to reliably infer from the filesystem layout precisely where in the module hierarchy you intended a module to live. See https://www.python.org/dev/peps/pep-0420/#discussion for references.
However, one of the subproposals from PEP 395 still offers a potential fix: https://www.python.org/dev/peps/pep-0395/#id24
That proposes to allow explicit relative imports at the command line, such that Paul's example could be correctly invoked as:
python3 -m ..pkg.foo
It would also be possible to provide a syntactic shorthand for submodules of top level packages:
python3 -m .foo
The key here is that the interpreter is being explicitly told that the current directory is inside a package, as well as how far down in the package hierarchy it lives, and can adjust the way it sets sys.path[0] accordingly before proceeding on to import "pkg.foo" as __main__.
That should be a relatively uncomplicated addition to runpy._run_module_as_main that could be rolled into Cameron's PEP plans. Steps required:
* count leading dots in the supplied mod_name * remove "leading_dots-1" trailing directory names from sys.path[0] * strip the leading dots from mod_name before continuing with the rest of the function * in the special case of only 1 leading dot, remove the final directory segment from sys.path[0] and prepend it to mod_name with a dot separator
Cheers, Nick.
-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Best regards, Paul mailto:pmiscml@gmail.com
participants (5)
-
Ben Finney
-
Cameron Simpson
-
Eric Snow
-
Nick Coghlan
-
Paul Sokolovsky