Re: [Python-ideas] Provide a way to import module without exec body
On Fri, 1 Dec 2017 at 10:11 Neil Schemenauer <neil@python.ca> wrote:
On 2017-12-01, Chris Angelico wrote:
Can you elaborate on where this is useful, please?
Introspection tools, for example, might want to look at the module without executing it. Also, it is a building block to make lazy loading of modules work. As Nick points out, importlib can do this already.
Currently, the IMPORT_NAME both loads the code for a module and also executes it. The exec happens fairly deep in the guts of importlib. This makes import.c and ceval.c mutually recursive. The locking gets complicated. There are hacks like _call_with_frames_removed() to hide the recursion going on.
Instead, we could have two separate opcodes, one that gets the module but does not exec it (i.e. a function like __import__() that returns a future) and another opcode that actually does the execution. Figuring out all the details is complicated.
Possible benefits:
- importlib is simpler
- reduce the amount of stack space used (removing recursion by "continuation passing style").
- makes profiling Python easier. Tools like valgrind get confused by call cycle between ceval.c and import.c.
- easier to implement lazy loading of modules (not necessarily a standard Python feature but will make 3rd party implementations cleaner)
I'm CCing Brett as I'm sure he has thoughts on this, given his intimate knowledge of importlib. To me, it seems like __import__() has a terribly complicated API because it does so many different things.
I have always assumed the call signature for __import__() was because the import-related opcodes pushed so much logic into the function instead of doing it in opcodes (I actually blogged about this at https://snarky.ca/if-i-were-designing-imort-from-scratch/). Heck, the thing takes in locals() and yet never uses them (and its use of globals() is restricted to specific values so it really doesn't need to be quite so broad). Basically I wished __import__() looked like importlib.import_module().
Maybe two opcodes is not even enough. Maybe we should have one to resolve relative imports (i.e. import.c:resolve_name), one to load but not exec a module given its absolute name (i.e. _find_and_load() without the exec), one to exec a loaded module, one or more to handle the horror of "fromlist" (i.e. _handle_fromlist()).
I have always wanted to at least break up getting the module and fromlist as separate opcodes, so +1 for that. Name resolution could potentially be done as an opcode as it relies on execution state pulled from the globals of the module, but the logic also isn't difficult so +0 for that (i.e. making an opcode that calls something more like importlib.import_module() is more critical to me than eliminating the 'package' argument to that call, but I don't view it as a bad thing to have another opcode for that either). As for the completely separating the loading and execution, I don't have a need for what's being proposed so I don't have an opinion. I basically made sure Eric Snow structured specs so that lazy loading as currently supported works so I got what I wanted for basic lazy importing (short of the PyPI package I keep talking about writing to add a nicer API around lazy importing :) . -Brett
Regards,
Neil
On 2 December 2017 at 07:55, Brett Cannon <brett@python.org> wrote:
As for the completely separating the loading and execution, I don't have a need for what's being proposed so I don't have an opinion. I basically made sure Eric Snow structured specs so that lazy loading as currently supported works so I got what I wanted for basic lazy importing (short of the PyPI package I keep talking about writing to add a nicer API around lazy importing :) .
In PEP 451 terms, I can definitely see the value in having CREATE_MODULE and EXEC_MODULE be separate opcodes (rather than having them be jammed together in IMPORT_MODULE the way they are now). While there'd still be some import machinery on the frame stack when the module code ran (due to the way the "exec_module" API is defined), there'd be substantially less of it. There'd be some subtleties around handling backwards compatibility with __import__ overrides (essentially, CREATE_MODULE would have to revert to doing all the work, while EXEC_MODULE would become a no-op), but the basic idea seems plausible. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 3 December 2017 at 13:22, Nick Coghlan <ncoghlan@gmail.com> wrote:
On 2 December 2017 at 07:55, Brett Cannon <brett@python.org> wrote:
As for the completely separating the loading and execution, I don't have a need for what's being proposed so I don't have an opinion. I basically made sure Eric Snow structured specs so that lazy loading as currently supported works so I got what I wanted for basic lazy importing (short of the PyPI package I keep talking about writing to add a nicer API around lazy importing :) .
In PEP 451 terms, I can definitely see the value in having CREATE_MODULE and EXEC_MODULE be separate opcodes (rather than having them be jammed together in IMPORT_MODULE the way they are now). While there'd still be some import machinery on the frame stack when the module code ran (due to the way the "exec_module" API is defined), there'd be substantially less of it.
There'd be some subtleties around handling backwards compatibility with __import__ overrides (essentially, CREATE_MODULE would have to revert to doing all the work, while EXEC_MODULE would become a no-op), but the basic idea seems plausible.
Re-reading my own post reminded me of another potentially harder problem: IMPORT_MODULE also hides all the import cache management from the eval loop. If you try to split creation and execution apart, then that cache management becomes the eval loop's problem (since it needs to know whether the module is already fully initialised or not after the "GET_OR_CREATE_MODULE" step. That cache locking is fairly intricate already, and exposing these to the eval loop as distinct operations wouldn't make that any easier. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 2017-12-03, Nick Coghlan wrote:
There'd be some subtleties around handling backwards compatibility with __import__ overrides (essentially, CREATE_MODULE would have to revert to doing all the work, while EXEC_MODULE would become a no-op), but the basic idea seems plausible.
Right now (half-baked ideas), I'm thinking: IMPORT_RESOLVE Gives the abs_name for a module (to feed to _find_and_load()) IMPORT_LOAD Calls _find_and_load() with abs_name as argment. The body of the module is not executed yet. Could return a spec or a module with the spec that contains the code object of the body. IMPORT_EXEC Executes the body of the module. IMPORT_FROM Calls _handle_fromlist(). Props to Brett for making importlib in such as way that this clean separation should be relatively easy to do. To handle custom __import__ hook, I think we can do the following. Have each opcode detect if __import__ is overridden. There is already such test (import_name fast path). If it is overridden, IMPORT_RESOLVE and IMPORT_LOAD will gather up info and then IMPORT_EXEC will call __import__() using compatible arguments. Inititally, the benefit of making these changes is not some performance improvement or some functionalty we didn't previously have. importlib does all this already and probably just as quickly. The benefit that the import system becomes more understandable. If we decide it is a good idea, we could expose hooks for these opcodes. Not like __import__ though. Maybe there should be a function like sys.set_import_hook(<op>, func). That will keep ceval fast as it will know if there is a hook or not, without having to crawl around in builtins. Regards, Neil
participants (3)
-
Brett Cannon
-
Neil Schemenauer
-
Nick Coghlan