Provide a way to import module without exec body

I have been working on reducing Python statup time. It would be nice if there was some way to load a module into memory without exec of its body code. I'm sure other people have wished for this. Perhaps there could be a new special function, similar to __import__ for this purpose. E.g. __load_module__(). To actually execute the module, I had the idea to make module objects callable, i.e. tp_call for PyModule_Type. That's a little too cute though and will cause confusion. Maybe instead, add a function attribute to modules, e.g. mod.__exec__(). I have a little experimental code, just a small step: https://github.com/nascheme/cpython/tree/import_defer_exec We need importlib to give us the module object and the bytecode without doing the exec(). My hackish solution is to set properties on __spec__ and then have PyImport_ImportModuleLevelObject() do the exec().

On 2017-12-01, Chris Angelico wrote:
Can you elaborate on where this is useful, please?
Introspection tools, for example, might want to look at the module without executing it. Also, it is a building block to make lazy loading of modules work. As Nick points out, importlib can do this already. Currently, the IMPORT_NAME both loads the code for a module and also executes it. The exec happens fairly deep in the guts of importlib. This makes import.c and ceval.c mutually recursive. The locking gets complicated. There are hacks like _call_with_frames_removed() to hide the recursion going on. Instead, we could have two separate opcodes, one that gets the module but does not exec it (i.e. a function like __import__() that returns a future) and another opcode that actually does the execution. Figuring out all the details is complicated. Possible benefits: - importlib is simpler - reduce the amount of stack space used (removing recursion by "continuation passing style"). - makes profiling Python easier. Tools like valgrind get confused by call cycle between ceval.c and import.c. - easier to implement lazy loading of modules (not necessarily a standard Python feature but will make 3rd party implementations cleaner) I'm CCing Brett as I'm sure he has thoughts on this, given his intimate knowledge of importlib. To me, it seems like __import__() has a terribly complicated API because it does so many different things. Maybe two opcodes is not even enough. Maybe we should have one to resolve relative imports (i.e. import.c:resolve_name), one to load but not exec a module given its absolute name (i.e. _find_and_load() without the exec), one to exec a loaded module, one or more to handle the horror of "fromlist" (i.e. _handle_fromlist()). Regards, Neil

01.12.17 20:12, Neil Schemenauer пише:
The IMPORT_NAME opcode is highly optimized. In most cases it just looks up in sys.modules and check that the module is not imported right now. I suppose two opcodes will hit performance. And I don't see how this could simplify the code. I suppose the existing importlib machinery already supports loading modules without executing them. Maybe not with a single function, but with a combination of 2-3 methods. But what you want to get? The source? The code object? What about modules implemented in C?

On 1 December 2017 at 18:13, Neil Schemenauer <nas-python-ideas@arctrix.com> wrote:
What does actually doing the load give that simply calling https://docs.python.org/3/library/importlib.html#importlib.util.find_spec doesn't? At that point, you know the module exists, and how to load it, which is all a lazy loading implementations really needs to be confident that a subsequent actual execution attempt will be able to start. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 1 December 2017 at 18:37, Nick Coghlan <ncoghlan@gmail.com> wrote:
After posting this, and while filing https://bugs.python.org/issue32192, I double checked how "importlib.util.module_from_spec" works, and it turns out that already handle the main part of what you're after: it creates the module without executing it. The actual execution is then handled by running "module.__spec__.loader.exec_module(module)". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 01, 2017 at 02:13:37AM -0600, Neil Schemenauer wrote:
I don't understand why you would want to do this. Given a source file: # module.py spam = 1 eggs = 2 if you import the module without executing the code in the module, surely you'll get a bare module with nothing in it? Then: module.spam module.eggs will both fail with AttributeError. If that's what you mean, then no, I haven't wished for that. Unless I'm missing something, it seems pointless. When, and why, would I want to import an empty module? -- Steve

On 1 December 2017 at 20:17, Steven D'Aprano <steve@pearwood.info> wrote:
Having access to something along these lines is the core building block for lazy loading. You figure out everything you need to actually load the module up front (so you still get an immediate ImportError if the module doesn't even exist), but then defer actually finishing the load to the first __getattr__ invocation (so if you never actually use the module, you avoid any transitive imports, as well as any other costs of initialising it). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I have found myself implementing something like this before. I was working on a command-line tool with nested sub-commands. Each sub-command would import a script and execute something out of it. I ended up moving the importing of those little scripts into the functions that called them because importing all of them was slowing things down. A built-in lazy importer would have made for a better solution. On Fri, Dec 1, 2017 at 5:36 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Fri, Dec 01, 2017 at 10:23:37AM -0500, brent bejot wrote:
If I understand your use-case, you have a bunch of functions like this: def spam_subcommand(): import spam spam.command() def eggs_subcommand(): import eggs eggs.command() With lazy importing, you might have something like this: spam = lazy_import('spam') eggs = lazy_import('eggs') def spam_subcommand(): load(spam) spam.command() def eggs_subcommand(): load(eggs) eggs.command() I don't see the benefit for your use-case. How would it be better? Have I missed something? -- Steve

On 2017-12-01 22:46, Steven D'Aprano wrote:
You don't think you'd need the 'load'; you'd delay execution of the module's code until the first attribute access. All of the script's module dependencies would be listed at the top, but you could avoid most of the cost of importing a module until you know that you need the module's functionality.

On 2017-12-01, Chris Angelico wrote:
Can you elaborate on where this is useful, please?
Introspection tools, for example, might want to look at the module without executing it. Also, it is a building block to make lazy loading of modules work. As Nick points out, importlib can do this already. Currently, the IMPORT_NAME both loads the code for a module and also executes it. The exec happens fairly deep in the guts of importlib. This makes import.c and ceval.c mutually recursive. The locking gets complicated. There are hacks like _call_with_frames_removed() to hide the recursion going on. Instead, we could have two separate opcodes, one that gets the module but does not exec it (i.e. a function like __import__() that returns a future) and another opcode that actually does the execution. Figuring out all the details is complicated. Possible benefits: - importlib is simpler - reduce the amount of stack space used (removing recursion by "continuation passing style"). - makes profiling Python easier. Tools like valgrind get confused by call cycle between ceval.c and import.c. - easier to implement lazy loading of modules (not necessarily a standard Python feature but will make 3rd party implementations cleaner) I'm CCing Brett as I'm sure he has thoughts on this, given his intimate knowledge of importlib. To me, it seems like __import__() has a terribly complicated API because it does so many different things. Maybe two opcodes is not even enough. Maybe we should have one to resolve relative imports (i.e. import.c:resolve_name), one to load but not exec a module given its absolute name (i.e. _find_and_load() without the exec), one to exec a loaded module, one or more to handle the horror of "fromlist" (i.e. _handle_fromlist()). Regards, Neil

01.12.17 20:12, Neil Schemenauer пише:
The IMPORT_NAME opcode is highly optimized. In most cases it just looks up in sys.modules and check that the module is not imported right now. I suppose two opcodes will hit performance. And I don't see how this could simplify the code. I suppose the existing importlib machinery already supports loading modules without executing them. Maybe not with a single function, but with a combination of 2-3 methods. But what you want to get? The source? The code object? What about modules implemented in C?

On 1 December 2017 at 18:13, Neil Schemenauer <nas-python-ideas@arctrix.com> wrote:
What does actually doing the load give that simply calling https://docs.python.org/3/library/importlib.html#importlib.util.find_spec doesn't? At that point, you know the module exists, and how to load it, which is all a lazy loading implementations really needs to be confident that a subsequent actual execution attempt will be able to start. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 1 December 2017 at 18:37, Nick Coghlan <ncoghlan@gmail.com> wrote:
After posting this, and while filing https://bugs.python.org/issue32192, I double checked how "importlib.util.module_from_spec" works, and it turns out that already handle the main part of what you're after: it creates the module without executing it. The actual execution is then handled by running "module.__spec__.loader.exec_module(module)". Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Dec 01, 2017 at 02:13:37AM -0600, Neil Schemenauer wrote:
I don't understand why you would want to do this. Given a source file: # module.py spam = 1 eggs = 2 if you import the module without executing the code in the module, surely you'll get a bare module with nothing in it? Then: module.spam module.eggs will both fail with AttributeError. If that's what you mean, then no, I haven't wished for that. Unless I'm missing something, it seems pointless. When, and why, would I want to import an empty module? -- Steve

On 1 December 2017 at 20:17, Steven D'Aprano <steve@pearwood.info> wrote:
Having access to something along these lines is the core building block for lazy loading. You figure out everything you need to actually load the module up front (so you still get an immediate ImportError if the module doesn't even exist), but then defer actually finishing the load to the first __getattr__ invocation (so if you never actually use the module, you avoid any transitive imports, as well as any other costs of initialising it). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

I have found myself implementing something like this before. I was working on a command-line tool with nested sub-commands. Each sub-command would import a script and execute something out of it. I ended up moving the importing of those little scripts into the functions that called them because importing all of them was slowing things down. A built-in lazy importer would have made for a better solution. On Fri, Dec 1, 2017 at 5:36 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Fri, Dec 01, 2017 at 10:23:37AM -0500, brent bejot wrote:
If I understand your use-case, you have a bunch of functions like this: def spam_subcommand(): import spam spam.command() def eggs_subcommand(): import eggs eggs.command() With lazy importing, you might have something like this: spam = lazy_import('spam') eggs = lazy_import('eggs') def spam_subcommand(): load(spam) spam.command() def eggs_subcommand(): load(eggs) eggs.command() I don't see the benefit for your use-case. How would it be better? Have I missed something? -- Steve

On 2017-12-01 22:46, Steven D'Aprano wrote:
You don't think you'd need the 'load'; you'd delay execution of the module's code until the first attribute access. All of the script's module dependencies would be listed at the top, but you could avoid most of the cost of importing a module until you know that you need the module's functionality.
participants (7)
-
brent bejot
-
Chris Angelico
-
MRAB
-
Neil Schemenauer
-
Nick Coghlan
-
Serhiy Storchaka
-
Steven D'Aprano