[Import-SIG] Rough PEP: A ModuleSpec Type for the Import System

Brett Cannon brett at python.org
Fri Aug 9 16:40:10 CEST 2013


I like the idea and I think it can be more-or-less safe. Just need more
specification/clarification on things.


On Fri, Aug 9, 2013 at 2:34 AM, Eric Snow <ericsnowcurrently at gmail.com>wrote:

> This is an outgrowth of discussions on the .ref PEP, but it's also
> something I've been thinking about for over a year and starting toying with
> at the last PyCon.  I have a patch that passes all but a couple unit tests
> and should pass though when I get a minute to take another pass at it.
>  I'll probably end up adding a bunch more unit tests before I'm done as
> well.  However, the functionality is mostly there.
>
> BTW, I gotta say, Brett, I have a renewed appreciation for the long and
> hard effort you put into importlib.  There are just so many odd corner
> cases that I never would have looked for if not for that library.  And
> those unit tests do a great job of covering all of that.  Thanks!
>

Welcome! And yes, importlib didn't take multiple years out of laziness, but
just how much work had to go in to cover corner cases along with pauses
from frustration with the semantics. :P


>
> -eric
>
>
> -------------------------------------------------------------------------------
>
> PEP: 4XX
> Title: A ModuleSpec Type for the Import System
> Version: $Revision$
> Last-Modified: $Date$
> Author: Eric Snow <ericsnowcurrently at gmail.com>
> BDFL-Delegate: ???
> Discussions-To: import-sig at python.org
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 8-Aug-2013
> Python-Version: 3.4
> Post-History: 8-Aug-2013
> Resolution:
>
>
> Abstract
> ========
>
> This PEP proposes to add a new class to ``importlib.machinery`` called
> ``ModuleSpec``.  It will contain all the import-related information
> about a module without needing to load the module first.  Finders will
> now return a module's spec rather than a loader.  The import system will
> use the spec to load the module.
>
>
> Motivation
> ==========
>
> The import system has evolved over the lifetime of Python.  In late 2002
> PEP 302 introduced standardized import hooks via ``finders`` and
> ``loaders`` and ``sys.meta_path``.  The ``importlib`` module, introduced
> with Python 3.1, now exposes a pure Python implementation of the APIs
> described by PEP 302, as well as of the full import system.  It is now
> much easier to understand and extend the import system.  While a benefit
> to the Python community, this greater accessibilty also presents a
> challenge.
>
> As more developers come to understand and customize the import system,
> any weaknesses in the finder and loader APIs will be more impactful.  So
> the sooner we can address any such weaknesses the import system, the
> better...and there are a couple we can take care of with this proposal.
>
> Firstly, any time the import system needs to save information about a
> module we end up with more attributes on module objects that are
> generally only meaningful to the import system and occoasionally to some
> people.  It would be nice to have a per-module namespace to put future
> import-related information.  Secondly, there's an API void between
> finders and loaders that causes undue complexity when encountered.
>
> Finders are strictly responsible for providing the loader which the
> import system will use to load the module.  The loader is then
> responsible for doing some checks, creating the module object, setting
> import-related attributes, "installing" the module to ``sys.modules``,
> and loading the module, along with some cleanup.  This all takes place
> during the import system's call to ``Loader.load_module()``.  Loaders
> also provide some APIs for accessing data associated with a module.
>
> Loaders are not required to provide any of the functionality of
> ``load_module()`` through other methods.  Thus, though the import-
> related information about a module is likely available without loading
> the module, it is not otherwise exposed.
>
> Furthermore, the requirements assocated with ``load_module()`` are
> common to all loaders and mostly are implemented in exactly the same
> way.  This means every loader has to duplicate the same boilerplate
> code.  ``importlib.util`` provides some tools that help with this, but
> it would be more helpful if the import system simply took charge of
> these responsibilities.  The trouble is that this would limit the degree
> of customization that ``load_module()`` facilitates.  This is a gap
> between finders and loaders which this proposal aims to fill.
>
> Finally, when the import system calls a finder's ``find_module()``, the
> finder makes use of a variety of information about the module that is
> useful outside the context of the method.  Currently the options are
> limited for persisting that per-module information past the method call,
> since it only returns the loader.  Either store it in a module-to-info
> mapping somewhere like on the finder itself, or store it on the loader.
>

The two previous sentences are hard to read; I think you were after
something like,
"Popular options for this limitation are to store the information is in a
module-to-info
mapping somewhere on the finder itself, or store it on the loader.


> Unfortunately, loaders are not required to be module-specific.  On top
> of that, some of the useful information finders could provide is
> common to all finders, so ideally the import system could take care of
> that.  This is the same gap as before between finders and loaders.
>
> As an example of complexity attributable to this flaw, the
> implementation of namespace packages in Python 3.3 (see PEP 420) added
> ``FileFinder.find_loader()`` because there was no good way for
> ``find_module()`` to provide the namespace path.
>
> The answer to this gap is a ``ModuleSpec`` object that contains the
> per-module information and takes care of the boilerplate functionality
> of loading the module.
>
> (The idea grew feet during discussions related to another PEP.[1])
>

"(This PEP grew out of discussions related to another PEP [1])"


>
>
> Specification
> =============
>
> ModuleSpec
> ----------
>
> A new class which defines the import-related values to use when loading
> the module.  It closely corresponds to the import-related attributes of
> module objects.  ``ModuleSpec`` objects may also be used by finders and
> loaders and other import-related APIs to hold extra import-related
> information about the module.  This greatly reduces the need to add any
> new import-related attributes to module objects.
>
> Attributes:
>
> * ``name`` - the module's name (compare to ``__name__``).
> * ``loader`` - the loader to use during loading and for module data
>   (compare to ``__loader__``).
> * ``package`` - the name of the module's parent (compare to
>   ``__package__``).
> * ``is_package`` - whether or not the module is a package.
>

I think is_package() is redundant in the face of 'name'/'package' or 'path'
as you can introspect the same information. I honestly have always found it
a weakness of InspectLoader.is_package() that it didn't return the value
for __path__.


> * ``origin`` - the location from which the module originates.
>

Don't quite follow what this is meant to represent? Like the path to the
zipfile if loaded that way, otherwise it's the file path?


> * ``filename`` - like origin, but limited to a path-based location
>   (compare to ``__file__``).
> * ``cached`` - the location where the compiled module should be stored
>   (compare to ``__cached__``).
> * ``path`` - the list of path entries in which to search for submodules
>   or ``None``.  (compare to ``__path__``).  It should be in sync with
>   ``is_package``.
>

Why is 'path' the only attribute with a default value? Should probably say
everything has a default value of None if not set/known.


>
> Those are also the parameters to ``ModuleSpec.__init__()``, in that
> order.
>

I would consider arguing all arguments should be keyword-only past 'name'
since there is no way most people will remember that order correctly.


>  The last three are optional.
>

(filename, cached, and path).

And that definitely makes is_package redundant if that's true.


>   When passed the values are taken
> as-is.  The ``from_loader()`` method offers calculated values.
>

"(see below)."


>
> Methods:
>
> * ``from_loader(cls, ...)`` - returns a new ``ModuleSpec`` derived from the
>   arguments.  The parameters are the same as with ``__init__``, except
>   ``package`` is excluded and only ``name`` and ``loader`` are required.
>

Why the switch in requirements compared to __init__()?


> * ``module_repr()`` - returns a repr for the module.
> * ``init_module_attrs(module)`` - sets the module's import-related
>   attributes.
>

Specify what those attributes are and how they are set.


> * ``load(module=None, *, is_reload=False)`` - calls the loader's
>   ``exec_module()``, falling back to ``load_module()`` if necessary.
>   This method performs the former responsibilities of loaders for
>   managing modules before actually loading and for cleaning up.  The
>   reload case is facilitated by the ``module`` and ``is_reload``
>   parameters.
>

If a module is provided and there is already a matching key in sys.modules,
what happens? What if is_reload is True but there is no module provided or
in sys.modules; KeyError, ValueError, ImportError? Do you follow having
None in sys.modules and raise ImportError, or do you overwrite (same
question if a module is explicitly provided)?


>
> Values Derived by from_loader()
> -------------------------------
>
> As implied above, ``from_loader()`` makes a best effort at calculating
> any of the values that are not passed in.  It duplicates the behavior
> that was formerly provided the several ``importlib.util`` functions as
> well as the ``init_module_attrs()`` method of several of ``importlib``'s
> loaders.  Just to be clear, here is a more detailed description of those
> calculations:
>
> ``is_package`` is derived from ``path``, if passed.  Otherwise the
> loader's ``is_package()`` is tried.  Finally, it defaults to False.
>

It can also be calculated based on whether ``name`` == ``package``: ``True
if path is not None else name == package``.
Always need to watch out for [] for path as that is valid and signals the
module is a package.

This is where defining exactly what details need to be passed in and which
ones are optional are going to be critical in determining what represents
ambiguity/unknown details vs. what is flat-out known to be true/false.


>
> ``filename`` is pulled from the loader's ``get_filename()``, if
> possible.
>
> ``path`` is set to an empty list if ``is_package`` is true, and the
> directory from ``filename`` is appended to it, if available.
>
> ``cached`` is derived from ``filename`` if it's available.
>

Derived how?


>
> ``origin`` is set to ``filename``.
>
> ``package`` is set to ``name`` if the module is a package and
>

"... is a package, else to ..."


> to ``name.rpartition('.')[0]`` otherwise.  Consequently, a
> top-level module will have ``package`` set to the empty string.
>
> Backward Compatibility
> ----------------------
>
> Since finder ``find_module()``
>

``Finder.find_module()``


>  methods would now return a module spec
> instead of loader, specs must act like the loader that would have been
> returned instead.  This is relatively simple to solve since the loader
> is available as an attribute of the spec.
>

Are you going to define a __getattr__ to delegate to the loader? Or are you
going to specifically define equivalent methods, e.g. get_filename() is
obviously solvable by getting the attribute from the spec (as long as
filename is a required value)?


>
> However, ``ModuleSpec.is_package`` (an attribute) conflicts with
> ``InspectLoader.is_package()`` (a method).  Working around this requires
> a more complicated solution but is not a large obstacle.
>
> Unfortunately, the ability to proxy does not extend to ``id()``
> comparisons and ``isinstance()`` tests.  In the case of the return value
> of ``find_module()``, we accept that break in backward compatibility.
>

Mention that ModuleSpec can be added to the proper ABCs in importlib.abc to
help alleviate this issue.


>
> Subclassing
> -----------
>
> .. XXX Allowed but discouraged?
>

Why should it matter if they are subclassed?


>
> Module Objects
> --------------
>
> Module objects will now have a ``__spec__`` attribute to which the
> module's spec will be bound.  None of the other import-related module
> attributes will be changed or deprecated, though some of them could be.
> Any such deprecation can wait until Python 4.
>

"... could be; any such ..."


>
> ``ModuleSpec`` objects will not be kept in sync with the corresponding
> module object's import-related attributes.  They may differ, though in
> practice they will be the same.
>

"Though they may differ, in practice they will typically be the same."


>
> Finders
> -------
>
> Finders will now return ModuleSpec objects when ``find_module()`` is
> called rather than loaders.  For backward compatility, ``Modulespec``
> objects proxy the attributes of their ``loader`` attribute.
>
> Adding another similar method to avoid backward-compatibility issues
> is undersireable if avoidable.  The import APIs have suffered enough.
>

in lieu of the fact that find_loader() was just introduced in Python 3.3.


>  The approach taken by this PEP should be sufficient.
>
> The change to ``find_module()`` applies to both ``MetaPathFinder`` and
> ``PathEntryFinder``.  ``PathEntryFinder.find_loader()`` will be
> deprecated and, for backward compatibility, implicitly special-cased if
> the method exists on a finder.
>
> Loaders
> -------
>
> Loaders will have a new method, ``exec_module(module)``.  Its only job
> is to "exec" the module and consequently populate the module's
> namespace.  It is not responsible for creating or preparing the module
> object, nor for any cleanup afterward.  It has no return value.
>
> The ``load_module()`` of loaders will still work and be an active part
> of the loader API.  It is still useful for cases where the default
> module creation/prepartion/cleanup is not appropriate for the loader.
>

But will it still be required? Obviously importlib.abc.Loader can grow a
default load_module() defined around exec_module(), but it should be clear
if we expect the method to always be manually defined or if it will
eventually go away.


>
> A loader must have ``exec_module()`` or ``load_module()`` defined.  If
> both exist on the loader, ``exec_module()`` is used and
> ``load_module()`` is ignored.
>

Ignored by whom? Should specify that the import system is the one doing the
ignoring.


>
> PEP 420 introduced the optional ``module_repr()`` loader method to limit
> the amount of special-casing in the module type's ``__repr__()``.  Since
> this method is part of ``ModuleSpec``, it will be deprecated on loaders.
> However, if it exists on a loader it will be used exclusively.
>
> The loader ``init_module_attr()`` method, added for Python 3.4 will be
> eliminated in favor of the same method on ``ModuleSpec``.
>

"method, added prior to Python 3.4's release, will be removed ..."


>
> However, ``InspectLoader.is_package()`` will not be deprecated even
> though the same information is found on ``ModuleSpec``.  ``ModuleSpec``
> can use it to populate its own ``is_package`` if that information is
> not otherwise available.  Still, it will be made optional.
>
> In addition to executing a module during loading, loaders will still be
> directly responsible for providing APIs concerning module-related data.
>
> Other Changes
> -------------
>
> * The various finders and loaders provided by ``importlib`` will be
> updated to comply with this proposal.
>
> * The spec for the ``__main__`` module will reflect how the interpreter
> was started.  For instance, with ``-m`` the spec's name will be that of
> the run module, while ``__main__.__name__`` will still be "__main__".
>
> * We add ``importlib.find_module()`` to mirror
> ``importlib.find_loader()`` (which becomes deprecated).
>
> * Deprecations in ``importlib.util``: ``set_package()``,
>  ``set_loader()``, and ``module_for_loader()``.  ``module_to_load()``
> (introduced in 3.4) can be removed.
>

"(introduced prior to Python 3.4's release)"; remember, PEPs are timeless
and will outlive 3.4 so specifying it never went public is important.


>
> * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.
>
> * ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of
> the per-module import lock, whereas ``Loader.load_module()`` did not.
>

> Reference Implementation
> ------------------------
>
> A reference implementation is available at <TBD>.
>
>
> References
> ==========
>
> [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>
>
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130809/1369e9f3/attachment-0001.html>


More information about the Import-SIG mailing list