On 2017-12-01, Chris Angelico wrote:
> Can you elaborate on where this is useful, please?
Introspection tools, for example, might want to look at the module
without executing it. Also, it is a building block to make lazy loading
of modules work. As Nick points out, importlib can do this already.
Currently, the IMPORT_NAME both loads the code for a module and also
executes it. The exec happens fairly deep in the guts of importlib.
This makes import.c and ceval.c mutually recursive. The locking gets
complicated. There are hacks like _call_with_frames_removed() to hide
the recursion going on.
Instead, we could have two separate opcodes, one that gets the module
but does not exec it (i.e. a function like __import__() that returns a
future) and another opcode that actually does the execution. Figuring
out all the details is complicated.
Possible benefits:
- importlib is simpler
- reduce the amount of stack space used (removing recursion by
"continuation passing style").
- makes profiling Python easier. Tools like valgrind get confused
by call cycle between ceval.c and import.c.
- easier to implement lazy loading of modules (not necessarily a
standard Python feature but will make 3rd party implementations
cleaner)
I'm CCing Brett as I'm sure he has thoughts on this, given his intimate
knowledge of importlib. To me, it seems like __import__() has a
terribly complicated API because it does so many different things.
I have always assumed the call signature for __import__() was because the import-related opcodes pushed so much logic into the function instead of doing it in opcodes (I actually blogged about this at
https://snarky.ca/if-i-were-designing-imort-from-scratch/). Heck, the thing takes in locals() and yet never uses them (and its use of globals() is restricted to specific values so it really doesn't need to be quite so broad). Basically I wished __import__() looked like importlib.import_module().
Maybe two opcodes is not even enough. Maybe we should have one to
resolve relative imports (i.e. import.c:resolve_name), one to load but
not exec a module given its absolute name (i.e. _find_and_load()
without the exec), one to exec a loaded module, one or more to handle
the horror of "fromlist" (i.e. _handle_fromlist()).
I have always wanted to at least break up getting the module and fromlist as separate opcodes, so +1 for that. Name resolution could potentially be done as an opcode as it relies on execution state pulled from the globals of the module, but the logic also isn't difficult so +0 for that (i.e. making an opcode that calls something more like importlib.import_module() is more critical to me than eliminating the 'package' argument to that call, but I don't view it as a bad thing to have another opcode for that either).
As for the completely separating the loading and execution, I don't have a need for what's being proposed so I don't have an opinion. I basically made sure Eric Snow structured specs so that lazy loading as currently supported works so I got what I wanted for basic lazy importing (short of the PyPI package I keep talking about writing to add a nicer API around lazy importing :) .
-Brett
Regards,
Neil