[Import-SIG] latest update of PEP 451
ericsnowcurrently at gmail.com
Wed Sep 25 07:46:45 CEST 2013
I've updated PEP 451 to address comments and clear a few things up. Most
notably, I added a list of terms at the beginning.
The PEP is pretty close to done and feedback has simmered down. Does
anyone object to my posting the next update to python-dev?
There are two main open questions:
1. How does ModuleSpec interact with existing import-sensitive modules in
the standard library?
2. PJE's concerns about reload semantics and lazy loading.
Regarding the first, I'm not too concerned with the ability to adapt those
modules to ModuleSpec without much effort. However, I will be doing a
thorough check before I'll ask for pronouncement.
About lazy loading, from what I understand, importlib.reload() broke
backward compatibility with regards to PJE's use case when it switched to
depending on __loader__. Perhaps it was even before that. I'll have to
check. Regardless, PEP 451 does not change the semantic of reload() from
what they currently are.
The PEP could restore the previous semantics without a lot of work (if
module.__spec__ is not set, then call find_spec(), set it, and reload using
that). However, if reload() backward compatibility got broken somewhere
along the lines, that sounds like a bug that should be addressed separately.
Title: A ModuleSpec Type for the Import System
Author: Eric Snow <ericsnowcurrently at gmail.com>
Discussions-To: import-sig at python.org
Type: Standards Track
Post-History: 8-Aug-2013, 28-Aug-2013, 18-Sep-2013
This PEP proposes to add a new class to importlib.machinery called
"ModuleSpec". It will provide all the import-related information used
to load a module and will be available without needing to load the
module first. Finders will directly provide a module's spec instead of
a loader (which they will continue to provide indirectly). The import
machinery will be adjusted to take advantage of module specs, including
using them to load modules.
Terms and Concepts
The changes in this proposal are an opportunity to make several
existing terms and concepts more clear, whereas currently they are
(unfortunately) ambiguous. New concepts are also introduced in this
proposal. Finally, it's worth explaining a few other existing terms
with which people may not be so familiar. For the sake of context, here
is a brief summary of all three groups of terms and concepts. A more
detailed explanation of the import system is found at
A "finder" is an object that identifies the loader that the import
system should use to load a module. Currently this is accomplished by
calling the finder's find_module() method, which returns the loader.
Finders are strictly responsible for providing the loader, which they do
through their find_module() method. The import system then uses that
loader to load the module.
A "loader" is an object that is used to load a module during import.
Currently this is done by calling the loader's load_module() method. A
loader may also provide APIs for getting information about the modules
it can load, as well as about data from sources associated with such a
Right now loaders (via load_module()) are responsible for certain
boilerplate import-related operations. These are:
1. perform some (module-related) validation;
2. create the module object;
3. set import-related attributes on the module;
4. "register" the module to sys.modules;
5. exec the module;
6. clean up in the event of failure while loading the module.
This all takes place during the import system's call to
This is a new term and concept. The idea of it exists subtly in the
import system already, but this proposal makes the concept explicit.
"origin" is the import context means the system (or resource within a
system) from which a module originates. For the purposes of this
proposal, "origin" is also a string which identifies such a resource or
system. "origin" is applicable to all modules.
For example, the origin for built-in and frozen modules is the
interpreter itself. The import system already identifies this origin as
"built-in" and "frozen", respectively. This is demonstrated in the
following module repr: "<module 'sys' (built-in)>".
In fact, the module repr is already a relatively reliable, though
implicit, indicator of a module's origin. Other modules also indicate
their origin through other means, as described in the entry for
It is up to the loader to decide on how to interpret and use a module's
origin, if at all.
This is a new term. However the concept already exists clearly in the
import system, as associated with the ``__file__`` and ``__path__``
attributes of modules, as well as the name/term "path" elsewhere.
A "location" is a resource or "place", rather than a system at large,
from which a module is loaded. It qualifies as an "origin". Examples
of locations include filesystem paths and URLs. A location is
identified by the name of the resource, but may not necessarily identify
the system to which the resource pertains. In such cases the loader
would have to identify the system itself.
In contrast to other kinds of module origin, a location cannot be
inferred by the loader just by the module name. Instead, the loader
must be provided with a string to identify the location, usually by the
finder that generates the loader. The loader then uses this information
to locate the resource from which it will load the module. In theory
you could load the module at a given location under various names.
The most common example of locations in the import system are the
files from which source and extension modules are loaded. For these
modules the location is identified by the string in the ``__file__``
attribute. Although ``__file__`` isn't particularly accurate for some
modules (e.g. zipped), it is currently the only way that the import
system indicates that a module has a location.
A module that has a location may be called "locatable".
The import system stores compiled modules in the __pycache__ directory
as an optimization. This module cache that we use today was provided by
PEP 3147. For this proposal, the relevant API for module caching is the
``__cache__`` attribute of modules and the cache_from_source() function
in importlib.util. Loaders are responsible for putting modules into the
cache (and loading out of the cache). Currently the cache is only used
for compiled source modules. However, this proposal explicitly allows
The concept does not change, nor does the term. However, the
distinction between modules and packages is mostly superficial.
Packages *are* modules. They simply have a ``__path__`` attribute and
import may add attributes bound to submodules. The typical perceived
difference is a source of confusion. This proposal explicitly
de-emphasizes the distinction between packages and modules where it
makes sense to do so.
The import system has evolved over the lifetime of Python. In late 2002
PEP 302 introduced standardized import hooks via finders and
loaders and sys.meta_path. The importlib module, introduced
with Python 3.1, now exposes a pure Python implementation of the APIs
described by PEP 302, as well as of the full import system. It is now
much easier to understand and extend the import system. While a benefit
to the Python community, this greater accessibilty also presents a
As more developers come to understand and customize the import system,
any weaknesses in the finder and loader APIs will be more impactful. So
the sooner we can address any such weaknesses the import system, the
better...and there are a couple we can take care of with this proposal.
Firstly, any time the import system needs to save information about a
module we end up with more attributes on module objects that are
generally only meaningful to the import system. It would be nice to
have a per-module namespace in which to put future import-related
information and to pass around within the import system. Secondly,
there's an API void between finders and loaders that causes undue
complexity when encountered. The PEP 420 (namespace packages)
implementation had to work around this. The complexity surfaced again
during recent efforts on a separate proposal. [ref_files_pep]_
The `finder`_ and `loader`_ sections above detail current responsibility
of both. Notably, loaders are not required to provide any of the
functionality of their load_module() through other methods. Thus,
though the import-related information about a module is likely available
without loading the module, it is not otherwise exposed.
Furthermore, the requirements assocated with load_module() are
common to all loaders and mostly are implemented in exactly the same
way. This means every loader has to duplicate the same boilerplate
code. importlib.util provides some tools that help with this, but
it would be more helpful if the import system simply took charge of
these responsibilities. The trouble is that this would limit the degree
of customization that load_module() could easily continue to facilitate.
More importantly, While a finder *could* provide the information that
the loader's load_module() would need, it currently has no consistent
way to get it to the loader. This is a gap between finders and loaders
which this proposal aims to fill.
Finally, when the import system calls a finder's find_module(), the
finder makes use of a variety of information about the module that is
useful outside the context of the method. Currently the options are
limited for persisting that per-module information past the method call,
since it only returns the loader. Popular options for this limitation
are to store the information in a module-to-info mapping somewhere on
the finder itself, or store it on the loader.
Unfortunately, loaders are not required to be module-specific. On top
of that, some of the useful information finders could provide is
common to all finders, so ideally the import system could take care of
those details. This is the same gap as before between finders and
As an example of complexity attributable to this flaw, the
implementation of namespace packages in Python 3.3 (see PEP 420) added
FileFinder.find_loader() because there was no good way for
find_module() to provide the namespace search locations.
The answer to this gap is a ModuleSpec object that contains the
per-module information and takes care of the boilerplate functionality
involved with loading the module.
The goal is to address the gap between finders and loaders while
changing as little of their semantics as possible. Though some
functionality and information is moved to the new ModuleSpec type,
their behavior should remain the same. However, for the sake of clarity
the finder and loader semantics will be explicitly identified.
Here is a high-level summary of the changes described by this PEP. More
detail is available in later sections.
A specification for a module's import-system-related state. See the
`ModuleSpec`_ section below for a more detailed description.
* ModuleSpec(name, loader, \*, origin=None, loader_state=None,
* name - a string for the name of the module.
* loader - the loader to use for loading.
* origin - the name of the place from which the module is loaded,
e.g. "builtin" for built-in modules and the filename for modules
loaded from source.
* submodule_search_locations - list of strings for where to find
submodules, if a package (None otherwise).
* loader_state - a container of extra module-specific data for use
* cached (property) - a string for where the compiled module should be
* parent (RO-property) - the name of the package to which the module
belongs as a submodule (or None).
* has_location (RO-property) - a flag indicating whether or not the
module's "origin" attribute refers to a location.
* module_repr() - provide a repr string for the spec'ed module;
non-locatable modules will use their origin (e.g. "built-in").
* init_module_attrs(module) - set any of a module's import-related
attributes that aren't already set.
These are ModuleSpec factory functions, meant as a convenience for
finders. See the `Factory Functions`_ section below for more detail.
* spec_from_file_location(name, location, \*, loader=None,
- build a spec from file-oriented information and loader APIs.
* from_loader(name, loader, \*, origin=None, is_package=None) - build
a spec with missing information filled in by using loader APIs.
This factory function is useful for some backward-compatibility
* spec_from_module(module, loader=None) - build a spec based on the
import-related attributes of an existing module.
Other API Additions
* importlib.find_spec(name, path=None) will work exactly the same as
importlib.find_loader() (which it replaces), but return a spec instead
of a loader.
* importlib.abc.Loader.exec_module(module) will execute a module in its
own namespace. It replaces importlib.abc.Loader.load_module(), taking
over its module execution functionality.
* importlib.abc.Loader.create_module(spec) (optional) will return the
module to use for loading.
* Module objects will have a new attribute: ``__spec__``.
* InspectLoader.is_package() will become optional.
* The parameters and attributes of the various loaders in
These were introduced prior to Python 3.4's release, so they can simply
* The import system implementation in importlib will be changed to make
use of ModuleSpec.
* importlib.reload() will make use of ModuleSpec.
* Import-related module attributes (other than ``__spec__``) will no
longer be used directly by the import system.
* Import-related attributes should no longer be added to modules
* The module type's ``__repr__()`` will be a thin wrapper around a pure
Python implementation which will leverage ModuleSpec.
* The spec for the ``__main__`` module will reflect the appropriate
name and origin.
* If a finder does not define find_spec(), a spec is derived from
the loader returned by find_module().
* PathEntryFinder.find_loader() still takes priority over
* Loader.load_module() is used if exec_module() is not defined.
What Will not Change?
* The syntax and semantics of the import statement.
* Existing finders and loaders will continue to work normally.
* The import-related module attributes will still be initialized with
the same information.
* Finders will still create loaders (now storing them in specs).
* Loader.load_module(), if a module defines it, will have all the
same requirements and may still be called directly.
* Loaders will still be responsible for module data APIs.
* importlib.reload() will still overwrite the import-related attributes.
Here's a quick breakdown of where responsibilities lie after this PEP.
* create loader
* create spec
* create module (optional)
* execute module
* orchestrate module loading
* boilerplate for module loading, including managing sys.modules and
setting import-related attributes
* create module if loader doesn't
* call loader.exec_module(), passing in the module in which to exec
* contain all the information the loader needs to exec the module
* provide the repr for modules
What Will Existing Finders and Loaders Have to Do Differently?
Immediately? Nothing. The status quo will be deprecated, but will
continue working. However, here are the things that the authors of
finders and loaders should change relative to this PEP:
* Implement find_spec() on finders.
* Implement exec_module() on loaders, if possible.
The ModuleSpec factory functions in importlib.util are intended to be
helpful for converting existing finders. from_loader() and
from_file_location() are both straight-forward utilities in this
regard. In the case where loaders already expose methods for creating
and preparing modules, ModuleSpec.from_module() may be useful to
the corresponding finder.
For existing loaders, exec_module() should be a relatively direct
conversion from the non-boilerplate portion of load_module(). In some
uncommon cases the loader should also implement create_module().
ModuleSpec objects have 3 distinct target audiences: Python itself,
import hooks, and normal Python users.
Python will use specs in the import machinery, in interpreter startup,
and in various standard library modules. Some modules are
import-oriented, like pkgutil, and others are not, like pickle and
pydoc. In all cases, the full ModuleSpec API will get used.
Import hooks (finders and loaders) will make use of the spec in specific
ways. First of all, finders may use the spec factory functions in
importlib.util to create spec objects. They may also directly adjust
the spec attributes after the spec is created. Secondly, the finder may
bind additional information to the spec (in finder_extras) for the
loader to consume during module creation/execution. Finally, loaders
will make use of the attributes on a spec when creating and/or executing
Python users will be able to inspect a module's ``__spec__`` to get
import-related information about the object. Generally, Python
applications and interactive users will not be using the ``ModuleSpec``
factory functions nor any the instance methods.
How Loading Will Work
This is an outline of what happens in ModuleSpec's loading
if not hasattr(spec.loader, 'exec_module'):
module = spec.loader.load_module(spec.name)
module = None
if hasattr(spec.loader, 'create_module'):
module = spec.loader.create_module(spec)
if module is None:
module = ModuleType(spec.name)
sys.modues[spec.name] = module
These steps are exactly what Loader.load_module() is already
expected to do. Loaders will thus be simplified since they will only
need to implement exec_module().
Note that we must return the module from sys.modules. During loading
the module may have replaced itself in sys.modules. Since we don't have
a post-import hook API to accommodate the use case, we have to deal with
it. However, in the replacement case we do not worry about setting the
import-related module attributes on the object. The module writer is on
their own if they are doing this.
Each of the following names is an attribute on ModuleSpec objects. A
value of None indicates "not set". This contrasts with module
objects where the attribute simply doesn't exist. Most of the
attributes correspond to the import-related attributes of modules. Here
is the mapping. The reverse of this mapping is used by
On ModuleSpec On Modules
| \* Set on the module only if spec.has_location is true.
| \*\* Set on the module only if the spec attribute is not None.
While package and has_location are read-only properties, the remaining
attributes can be replaced after the module spec is created and even
after import is complete. This allows for unusual cases where directly
modifying the spec is the best option. However, typical use should not
involve changing the state of a module's spec.
"origin" is a string for the name of the place from which the module
originates. See `origin`_ above. Aside from the informational value,
it is also used in module_repr(). In the case of a spec where
"has_location" is true, ``__file__`` is set to the value of "origin".
For built-in modules "origin" would be set to "built-in".
As explained in the `location`_ section above, many modules are
"locatable", meaning there is a corresponding resource from which the
module will be loaded and that resource can be described by a string.
In contrast, non-locatable modules can't be loaded in this fashion, e.g.
builtin modules and modules dynamically created in code. For these, the
name is the only way to access them, so they have an "origin" but not a
"has_location" is true if the module is locatable. In that case the
spec's origin is used as the location and ``__file__`` is set to
spec.origin. If additional location information is required (e.g.
zipimport), that information may be stored in spec.loader_state.
"has_location" may be implied from the existence of a load_data() method
on the loader.
Incidently, not all locatable modules will be cachable, but most will.
The list of location strings, typically directory paths, in which to
search for submodules. If the module is a package this will be set to
a list (even an empty one). Otherwise it is None.
The name of the corresponding module attribute, ``__path__``, is
relatively ambiguous. Instead of mirroring it, we use a more explicit
name that makes the purpose clear.
A finder may set loader_state to any value to provide additional
data for the loader to use during loading. A value of None is the
default and indicates that there is no additional data. Otherwise it
can be set to any object, such as a dict, list, or
types.SimpleNamespace, containing the relevant extra information.
For example, zipimporter could use it to pass the zip archive name
to the loader directly, rather than needing to derive it from origin
or create a custom loader for each find operation.
loader_state is meant for use by the finder and corresponding loader.
It is not guaranteed to be a stable resource for any other use.
**spec_from_file_location(name, location, \*, loader=None,
Build a spec from file-oriented information and loader APIs.
* "origin" will be set to the location.
* "has_location" will be set to True.
* "cached" will be set to the result of calling cache_from_source().
* "origin" can be deduced from loader.get_filename() (if "location" is
not passed in.
* "loader" can be deduced from suffix if the location is a filename.
* "submodule_search_locations" can be deduced from loader.is_package()
and from os.path.dirname(location) if locatin is a filename.
**from_loader(name, loader, \*, origin=None, is_package=None)**
Build a spec with missing information filled in by using loader APIs.
* "has_location" can be deduced from loader.get_data.
* "origin" can be deduced from loader.get_filename().
* "submodule_search_locations" can be deduced from loader.is_package()
and from os.path.dirname(location) if locatin is a filename.
Build a spec based on the import-related attributes of an existing
module. The spec attributes are set to the corresponding import-
related module attributes. See the table in `Attributes`_.
Omitted Attributes and Methods
The following ModuleSpec methods are not part of the public API since
it is easy to use them incorrectly and only the import system really
needs them (i.e. they would be an attractive nuisance).
* _create() - provide a new module to use for loading.
* _exec(module) - execute the spec into a module namespace.
* _load() - prepare a module and execute it in a protected way.
* _reload(module) - re-execute a module in a protected way.
Here are other omissions:
There is no "PathModuleSpec" subclass of ModuleSpec that separates out
has_location, cached, and submodule_search_locations. While that might
make the separation cleaner, module objects don't have that distinction.
ModuleSpec will support both cases equally well.
While "is_package" would be a simple additional attribute (aliasing
self.submodule_search_locations is not None), it perpetuates the
artificial (and mostly erroneous) distinction between modules and
Conceivably, a ModuleSpec.load() method could optionally take a list of
modules with which to interact instead of sys.modules. That
capability is left out of this PEP, but may be pursued separately at
some other time, including relative to PEP 406 (import engine).
Likewise load() could be leveraged to implement multi-version
imports. While interesting, doing so is outside the scope of this
* Add ModuleSpec.submodules (RO-property) - returns possible submodules
relative to the spec.
* Add ModuleSpec.loaded (RO-property) - the module in sys.module, if
* Add ModuleSpec.data - a descriptor that wraps the data API of the
* Also see [cleaner_reload_support]_.
ModuleSpec doesn't have any. This would be a different story if
Finder.find_module() were to return a module spec instead of loader.
In that case, specs would have to act like the loader that would have
been returned instead. Doing so would be relatively simple, but is an
unnecessary complication. It was part of earlier versions of this PEP.
Subclasses of ModuleSpec are allowed, but should not be necessary.
Simply setting loader_state or adding functionality to a custom
finder or loader will likely be a better fit and should be tried first.
However, as long as a subclass still fulfills the requirements of the
import system, objects of that type are completely fine as the return
value of Finder.find_spec().
Other than adding ``__spec__``, none of the import-related module
attributes will be changed or deprecated, though some of them could be;
any such deprecation can wait until Python 4.
A module's spec will not be kept in sync with the corresponding import-
related attributes. Though they may differ, in practice they will
typically be the same.
One notable exception is that case where a module is run as a script by
using the ``-m`` flag. In that case ``module.__spec__.name`` will
reflect the actual module name while ``module.__name__`` will be
A module's spec is not guaranteed to be identical between two modules
with the same name. Likewise there is no guarantee that successive
calls to importlib.find_spec() will return the same object or even an
equivalent object, though at least the latter is likely.
Finders are still responsible for creating the loader. That loader will
now be stored in the module spec returned by find_spec() rather
than returned directly. As is currently the case without the PEP, if a
loader would be costly to create, that loader can be designed to defer
the cost until later.
Finders will return ModuleSpec objects when find_spec() is
called. This new method replaces find_module() and
find_loader() (in the PathEntryFinder case). If a loader does
not have find_spec(), find_module() and find_loader() are
used instead, for backward-compatibility.
Adding yet another similar method to loaders is a case of practicality.
find_module() could be changed to return specs instead of loaders.
This is tempting because the import APIs have suffered enough,
especially considering PathEntryFinder.find_loader() was just
added in Python 3.3. However, the extra complexity and a less-than-
explicit method name aren't worth it.
Loaders will have a new method, exec_module(). Its only job
is to "exec" the module and consequently populate the module's
namespace. It is not responsible for creating or preparing the module
object, nor for any cleanup afterward. It has no return value.
exec_module() will be used during both loading and reloading.
exec_module() should properly handle the case where it is called more
than once. For some kinds of modules this may mean raising ImportError
every time after the first time the method is called. This is
particularly relevant for reloading, where some kinds of modules do not
support in-place reloading.
Loaders may also implement create_module() that will return a
new module to exec. It may return None to indicate that the default
module creation code should be used. One use case, though atypical, for
create_module() is to provide a module that is a subclass of the builtin
module type. Most loaders will not need to implement create_module(),
create_module() should properly handle the case where it is called more
than once for the same spec/module. This may include returning None or
exec_module() and create_module() should not set any import-related
module attributes. The fact that load_module() does is a design flaw
that this proposal aims to correct.
PEP 420 introduced the optional module_repr() loader method to limit
the amount of special-casing in the module type's ``__repr__()``. Since
this method is part of ModuleSpec, it will be deprecated on loaders.
However, if it exists on a loader it will be used exclusively.
Loader.init_module_attr() method, added prior to Python 3.4's
release , will be removed in favor of the same method on ModuleSpec.
However, InspectLoader.is_package() will not be deprecated even
though the same information is found on ModuleSpec. ModuleSpec
can use it to populate its own is_package if that information is
not otherwise available. Still, it will be made optional.
One consequence of ModuleSpec is that loader ``__init__`` methods will
no longer need to accommodate per-module state. The path-based loaders
in importlib take arguments in their ``__init__()`` and have
corresponding attributes. However, the need for those values is
eliminated by module specs.
In addition to executing a module during loading, loaders will still be
directly responsible for providing APIs concerning module-related data.
* The various finders and loaders provided by importlib will be
updated to comply with this proposal.
* The spec for the ``__main__`` module will reflect how the interpreter
was started. For instance, with ``-m`` the spec's name will be that
of the run module, while ``__main__.__name__`` will still be
* We add importlib.find_spec() to mirror
importlib.find_loader() (which becomes deprecated).
* importlib.reload() is changed to use ModuleSpec.load().
* importlib.reload() will now make use of the per-module import
A reference implementation will be available at
\* The impact of this change on pkgutil (and setuptools) needs looking
into. It has some generic function-based extensions to PEP 302. These
may break if importlib starts wrapping loaders without the tools'
\* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
For instance, pickle should be updated in the ``__main__`` case to look
\* Impact on some kinds of lazy loading modules. [lazy_import_concerns]_
.. [import_system_docs] http://docs.python.org/3/reference/import.html
This document has been placed in the public domain.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Import-SIG