PEP: 369 Title: Lazy importing and post import hooks Version: $Revision$ Last-Modified: $Date$ Author: Christian Heimes Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 02-Jan-2008 Python-Version: 2.6, 3.0 Post-History: Abstract ======== This PEP proposes enhancements for the import machinery to add lazy importing and post import hooks to Python. It is intended primarily to support the wider use of abstract base classes that is expected in Python 3.0. Rationale ========= In current Python an import always loads a module from the disk even if the importing module never actually uses the module named in the import statement. It requires some extra code to conditionally import modules or the unnecessary imports can slow down a small script. Embedding import statements inside functions is no solution, as doing so invokes the import machinery every time the function is called. Hiding the import inside a function also makes the modules dependencies less clear. Python also has no API to hook into the import machinery and execute code *after* a module is successfully loaded. The import hooks of PEP 302 are about finding modules and loading modules but they were not designed to as post import hooks. - An import always loads the module from the disk which may cause a considerable speed impact on the execution time of a small script. - Conditional imports make the code harder to read and may lead to slow and ugly function level imports. - Python can't notify code when a module is loaded. Use cases ========= A use case for a post import hook is mentioned in Nick Coghlan's initial posting [1]_. about callbacks on module import. It was found during the development of Python 3.0 and its ABCs. We wanted to register classes like decimal.Decimal with an ABC but the module should not be imported on every interpreter startup. Nick came up with this example:: @imp.when_imported('decimal') def register(decimal): Inexact.register(decimal.Decimal) The function ``register`` is registered as callback for the module named 'decimal'. When decimal is imported the function is called with the module object as argument. While this particular example isn't necessary in practice, (as decimal.Decimal will inherit from the appropriate abstract Number base class in 2.6 and 3.0), it still illustrates the principle. Existing implementations ======================== There are two major implementations for lazy imports in the Python world. PJE's peak.util.imports [3] supports lazy modules an post load hooks. My implementation shares a lot with his and it's partly based on his ideas. Zope 3's zope.deferredimport doesn't have post import hooks but it has additional methods for deprecation warnings. Post import hook implementation =============================== Post import hooks are called after a module has been loaded. The hooks are callable which take one argument, the module instance. They are registered by the dotted name of the module, e.g. 'os' or 'os.path'. The callable are stored in the dict ``sys.post_import_hooks`` which is a mapping from names (as string) to a list of callables or None. States ------ No hook was registered '''''''''''''''''''''' sys.post_import_hooks contains no entry for the module A hook is registered and the module is not loaded yet ''''''''''''''''''''''''''''''''''''''''''''''''''''' The import hook registry contains an entry sys.post_import_hooks["name"] = [hook1] A module is successfully loaded ''''''''''''''''''''''''''''''' The import machinery checks if sys.post_import_hooks contains post import hooks for the newly loaded module. If hooks are found then the hooks are called in the order they were registered with the module instance as first argument. The processing of the hooks is stopped when a method raises an exception. At the end the entry for the module name is removed from sys.post_import_hooks, even when an error has occured. A module can't be loaded '''''''''''''''''''''''' The import hooks are neither called nor removed from the registry. It may be possible to load the module later. A hook is registered but the module is already loaded ''''''''''''''''''''''''''''''''''''''''''''''''''''' The hook is fired immediately. C API ----- New PyImport_* API functions '''''''''''''''''''''''''''' PyObject* PyImport_GetPostImportHooks(void) Returns the dict sys.post_import_hooks or NULL PyObject* PyImport_NotifyModuleLoaded(PyObject *module) Notify the post import system that a module was requested. Returns the module or NULL if an error has occured. PyObject* PyImport_RegisterPostImportHook(PyObject *callable, PyObject *mod_name) Register a new hook ``callable`` for the module ``mod_name`` The PyImport_PostImportNotify() method is called by PyImport_ImportModuleLevel():: PyImport_ImportModuleLevel(...) { ... result = import_module_level(name, globals, locals, fromlist, level); result = PyImport_PostImportNotify(result); ... } Python API ---------- The import hook registry and two new API methods are exposed through the ``sys`` and ``imp`` module. sys.post_import_hooks The dict contains the post import hooks: {"name" : [hook1, hook2, ...], ...} imp.register_post_import_hook(hook, name) imp.notify_module_loaded(module) -> module The when_imported function decorator is also in the imp module, which is equivalent to: def when_imported(name): def register(hook): register_post_import_hook(hook, name) return register Lazy import implementation ========================== Lazy import (also known as deferred import) makes a module object available without locating and loading the actual file for the module. The real module is loaded upon the first attribute access using the standard import mechanism. Only a limited set of attributes can be read w/o loading the real module, that is ``__name__`` and ``__lazy_import__``. The former variable is used to load the actual module while the second signals the lazyness of the module. It's not required for the C implementation but it was added for user implementation of lazy modules as requested by PJE <>. Every read attempt to another attribute or every write attempt causes the real module to be loaded. If the load fails a ``LazyImportError`` (subclass of ``ImportError`` is raised and future access of the module object will raise the same error. The real module doesn't replace the lazy module. References to the lazy module are still valid and don't cause another read attempt. The implementation assigns real->md_dict lazy->md_dict (the __dict__ attributes) so that every read and write to the former lazy module ends up in the real module's namespace __dict__. The code also tries to unload the real module but it may not be possible when e.g mod_a loads mod_b, mod_b loads mod_c and mod_c import mod_a again. This doesn't cause a problem with the namespace dict but the identity check ``mod_c.mod_a is mod_a`` may be false. A puer Python implementation of the loader code may look like this (pseudo code):: lazy = sys.modules[name] del sys.modules[name] real = __import__(name) lazy.__dict__ = real.__dict__ sys.modules[name] = lazy The real module or an imported module by the real module may keep a reference to the real module instance. Because both the formerly lazy module instance and the real module share the same __dict__ every modification on one module is instantly available on the other object. C API ----- The module object struct gains two more entries. ``md_name`` holds the name of a lazy module (the __name__ attribute) and ``md_lazy`` signals the import status. PyModuleType changes '''''''''''''''''''' typedef enum { Py_MOD_INVALID = -1, Py_MOD_LOADED, Py_MOD_LAZY, Py_MOD_KEEP_DICT } PyModule_State; typedef struct { PyObject_HEAD PyObject *md_dict; PyObject *md_name; PyModule_State md_lazy; } PyModuleObject; PyObject * PyModule_NewLazy(const char *name) Creates a new lazy module instance int PyModule_IsLazy(PyObject *module) Checks if the module is lazy. The function first checks module->md_lazy. If ``md_lazy`` is Py_MOD_LOADED it also checks the attribute __lazy_import__. Py_MOD_INVALID real module can't be loaded, further attribute access raises an error Py_MOD_LOADED module is loaded Py_MOD_LAZY module is lazy, write and read access except __name__ and __lazy_import__ will load the real module. Py_MOD_KEEP_DICT Intermediate state of a real module, md_dict isn't cleared The last state is requires to prevent ``module_dealloc`` from replacing the values of the module dict with None. Python API ---------- ``__lazy_import__`` module attribute The module attribute ``__lazy_import__`` can be used by 3rd party implements of lazy modules to signal the laziness of a module. imp.is_lazy(mod) -> bool Checks if the module is lazy, falls back to ``__lazy_import__`` imp.import_lazy(name) -> module instance (lazy) Imports a module lazy, e.g. ``import_lazy("spam.ham")`` puts *spam.ham* in sys modules and returns the *spam.ham* module with actually loading it. imp.new_lazy_module(name) -> module instance Create a new lazy module instance w/o putting it into sys.modules imp.when_imported(name) -> decorator function for @when_imported(name) def hook(module): pass Open issues =========== Nick: There also needs to be a discussion of the import lock and potential hidden deadlock issues. Specifically, the first access to the lazy module that causes the real module to be loaded will attempt to acquire the import lock. Carelessly mixing lazy importing with threaded code is a recipe for trouble Backwards Compatibility ======================= The new features and API don't conflict with old import system of Python and don't cause any backward compatibility issues for most software. However systems like PEAK and Zope which implement their own lazy import magic need to follow some rules. The post import hook and lazy modules were carefully designed to cooperate with existing systems. It's the suggestion of the PEP author to replace own on-load-hooks with the new hook API. The alternative lazy or deferred imports will still work but the implementations must call the ``imp.notify_module_loaded`` function. Reference Implementation ======================== A reference implementation is already implemented and working. It still requires some cleanups, documentation updates and additional unit tests. Acknowledgments =============== Nick Coghlan, for proof reading and the initial discussion Phillip J. Eby, for his implementation in PEAK and help with my own implementation Copyright ========= This document has been placed in the public domain. References ========== .. [1] Interest in PEP for callbacks on module import http://permalink.gmane.org/gmane.comp.python.python-3000.devel/11126 .. [2] PEP 302: New Import Hooks http://www.python.org/dev/peps/pep-0302/ .. [3] peak.utils.imports http://svn.eby-sarna.com/Importing/peak/util/imports.py?view=markup .. [4] zope.deferredimport http://svn.zope.org/zope.deferredimport/trunk/src/zope/deferredimport/ .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End: