[Import-SIG] Round 2 for "A ModuleSpec Type for the Import System"

Sun Aug 11 22:08:26 CEST 2013

On Fri, Aug 9, 2013 at 6:58 PM, Eric Snow <ericsnowcurrently at gmail.com>wrote:

> Here's an updated version of the PEP for ModuleSpec which addresses the
> feedback I've gotten.  Thanks for the help.  The big open question, to me,
> is whether or not to have a separate reload() method.  I'll be looking into
> that when I get a chance.  There's also the question of a path-based
> subclass, but I'm currently not convinced it's worth it.
>
> -eric
>
> -----------------------------------
>
> PEP: 4XX
> Title: A ModuleSpec Type for the Import System
> Version: $Revision$
> Last-Modified: $Date$
> Author: Eric Snow <ericsnowcurrently at gmail.com>
> BDFL-Delegate: ???
> Discussions-To: import-sig at python.org
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 8-Aug-2013
> Python-Version: 3.4
> Post-History: 8-Aug-2013
> Resolution:
>
>
> Abstract
> ========
>
> This PEP proposes to add a new class to ``importlib.machinery`` called
> ``ModuleSpec``.  It will contain all the import-related information
> about a module without needing to load the module first.  Finders will
> now return a module's spec rather than a loader.  The import system will
> use the spec to load the module.
>
>
> Motivation
> ==========
>
> The import system has evolved over the lifetime of Python.  In late 2002
> PEP 302 introduced standardized import hooks via ``finders`` and
> ``loaders`` and ``sys.meta_path``.  The ``importlib`` module, introduced
> with Python 3.1, now exposes a pure Python implementation of the APIs
> described by PEP 302, as well as of the full import system.  It is now
> much easier to understand and extend the import system.  While a benefit
> to the Python community, this greater accessibilty also presents a
> challenge.
>
> As more developers come to understand and customize the import system,
> any weaknesses in the finder and loader APIs will be more impactful.  So
> the sooner we can address any such weaknesses the import system, the
> better...and there are a couple we can take care of with this proposal.
>
> Firstly, any time the import system needs to save information about a
> module we end up with more attributes on module objects that are
> generally only meaningful to the import system and occoasionally to some
> people.  It would be nice to have a per-module namespace to put future
> import-related information.  Secondly, there's an API void between
> finders and loaders that causes undue complexity when encountered.
>
> Finders are strictly responsible for providing the loader which the
> import system will use to load the module.  The loader is then
> responsible for doing some checks, creating the module object, setting
> import-related attributes, "installing" the module to ``sys.modules``,
> and loading the module, along with some cleanup.  This all takes place
> during the import system's call to ``Loader.load_module()``.  Loaders
> also provide some APIs for accessing data associated with a module.
>
> Loaders are not required to provide any of the functionality of
> ``load_module()`` through other methods.  Thus, though the import-
> related information about a module is likely available without loading
> the module, it is not otherwise exposed.
>
> Furthermore, the requirements assocated with ``load_module()`` are
> common to all loaders and mostly are implemented in exactly the same
> way.  This means every loader has to duplicate the same boilerplate
> code.  ``importlib.util`` provides some tools that help with this, but
> it would be more helpful if the import system simply took charge of
> these responsibilities.  The trouble is that this would limit the degree
> of customization that ``load_module()`` facilitates.  This is a gap
> between finders and loaders which this proposal aims to fill.
>
> Finally, when the import system calls a finder's ``find_module()``, the
> finder makes use of a variety of information about the module that is
> useful outside the context of the method.  Currently the options are
> limited for persisting that per-module information past the method call,
> since it only returns the loader.  Popular options for this limitation
> are to store the information in a module-to-info mapping somewhere on
> the finder itself, or store it on the loader.
>
> Unfortunately, loaders are not required to be module-specific.  On top
> of that, some of the useful information finders could provide is
> common to all finders, so ideally the import system could take care of
> that.  This is the same gap as before between finders and loaders.
>
> As an example of complexity attributable to this flaw, the
> implementation of namespace packages in Python 3.3 (see PEP 420) added
> ``FileFinder.find_loader()`` because there was no good way for
> ``find_module()`` to provide the namespace path.
>
> The answer to this gap is a ``ModuleSpec`` object that contains the
> per-module information and takes care of the boilerplate functionality
> of loading the module.
>
> (The idea gained momentum during discussions related to another PEP.[1])
>
>
> Specification
> =============
>
> The goal is to address the gap between finders and loaders while
> changing as little of their semantics as possible.  Though some
> functionality and information is moved the new ``ModuleSpec`` type,
> their semantics should remain the same.  However, for the sake of
> clarity, those semantics will be explicitly identified.
>
> A High-Level View
> -----------------
>
> ...
>
> ModuleSpec
> ----------
>
> A new class which defines the import-related values to use when loading
> the module.  It closely corresponds to the import-related attributes of
> module objects.  ``ModuleSpec`` objects may also be used by finders and
> loaders and other import-related APIs to hold extra import-related
> state about the module.  This greatly reduces the need to add any new
> new import-related attributes to module objects, and loader ``__init__``
> methods won't need to accommodate such per-module state.
>
> Creating a ModuleSpec:
>
> ``ModuleSpec(name, loader, *, origin=None, filename=None, cached=None,
> path=None)``
>
> The parameters have the same meaning as the attributes described below.
> However, not all ``ModuleSpec`` attributes are also parameters.  The
> passed values are set as-is.  For calculated values use the
> ``from_loader()`` method.
>
> ModuleSpec Attributes
> ---------------------
>
> Each of the following names is an attribute on ``ModuleSpec`` objects.
> A value of ``None`` indicates "not set".  This contrasts with module
> objects where the attribute simply doesn't exist.
>
> While ``package`` and ``is_package`` are read-only properties, the
> remaining attributes can be replaced after the module spec is created
> and after import is complete.  This allows for unusual cases where
> modifying the spec is the best option.  However, typical use should not
> involve changing the state of a module's spec.
>
> Most of the attributes correspond to the import-related attributes of
> modules.  Here is the mapping, followed by a description of the
> attributes.  The reverse of this mapping is used by
> ``init_module_attrs()``.
>
> ============= ===========
> On ModuleSpec On Modules
> ============= ===========
> name          __name__
> loader        __loader__
> package       __package__
> is_package    -
> origin        -
> filename      __file__
> cached        __cached__
> path          __path__
> ============= ===========
>
> ``name``
>
> The module's fully resolved and absolute name.  It must be set.
>
> ``loader``
>
> The loader to use during loading and for module data.  These specific
> functionalities do not change for loaders.  Finders are still
> responsible for creating the loader and this attribute is where it is
> stored.  The loader must be set.
>
> ``package``
>
> The name of the module's parent.  This is a dynamic attribute with a
> value derived from ``name`` and ``is_package``.  For packages it is the
> value of ``name``.  Otherwise it is equivalent to
> ``name.rpartition('.')[0]``.  Consequently, a top-level module will have
> give the empty string for ``package``.
>
>
> ``is_package``
>
> Whether or not the module is a package.  This dynamic attribute is True
> if ``path`` is set (even if empty), else it is false.
>

"is True if ``path`` is not None (e.g. the empty list is a "true" value),
else it is False".

>
> ``origin``
>
> A string for the location from which the module originates.  If
> ``filename`` is set, ``origin`` should be set to the same value unless
> some other value is more appropriate.  ``origin`` is used in
> ``module_repr()`` if it does not match the value of ``filename``.
>
> Using ``filename`` for this meaning would be inaccurate, since not all
> modules have path-based locations.  For instance, built-in modules do
> not have ``__file__`` set.  Yet it is useful to have a descriptive
> string indicating that it originated from the interpreter as a built-in
> module.  So built-in modules will have ``origin`` set to ``"built-in"``.
>

I still don't know what you would put there for a zipfile-based loader.
Would you still put __file__ or would you put the zipfile? I ask because I
would want a way to pass along in a zipfile finder to the loader where the
zipfile is located and then the internal location of the file. Otherwise
you need to pass in the zip path separately from the internal path to the
loader constructor instead of simply passing in a ModuleSpec (e.g. see
_split_path in http://bugs.python.org/file30660/zip_importlib.diff).

>
> Path-based attributes:
>
> If any of these is set, it indicates that the module is path-based.  For
> reference, a path entry is a string for a location where the import
> system will look for modules, e.g. the path entries in ``sys.path`` or a
> package's ``__path__``).
>
> ``filename``
>
> Like ``origin``, but limited to a path-based location.  If ``filename``
> is set, ``origin`` should be set to the same string, unless origin is
> explicitly set to something else.  ``filename`` is not necessarily an
> actual file name, but could be any location string based on a path
> entry.  Regarding the attribute name, while it is potentially
> inaccurate, it is both consistent with the equivalent module attribute
> and generally accurate.
>
> .. XXX Would a different name be better?  ``path_location``?
>
> ``cached``
>
> The path-based location where the compiled code for a module should be
> stored.  If ``filename`` is set to a source file, this should be set to
> corresponding path that PEP 3147 specifies.  The
> ``importlib.util.source_to_cache()`` function facilitates getting the
> correct value.
>
> ``path``
>
> The list of path entries in which to search for submodules if this
> module is a package.  Otherwise it is ``None``.
>
> .. XXX add a path-based subclass?
>

You mean like namespace package's __path__ object? Or are you saying you
want ModuleSpec vs. PackageSpec?

>
> ModuleSpec Methods
> ------------------
>
> ``from_loader(name, loader, *, is_package=None, origin=None,
> filename=None, cached=None, path=None)``
>
> .. XXX use a different name?
>
> A factory classmethod that returns a new ``ModuleSpec`` derived from the
> arguments.  ``is_package`` is used inside the method to indicate that
> the module is a package.
>

Why is this parameter instead of the other than inferring from 'path' or
loader.is_package() as you fall back on? What's the motivation?

>  If not explicitly passed in, it is set to
> ``True`` if ``path`` is passed in.  It falls back to using the result of
> the loader's ``is_package()``, if available.  Finally it defaults to
> False.  The remaining parameters have the same meaning as the
> corresponding ``ModuleSpec`` attributes.
>
> In contrast to ``ModuleSpec.__init__()``, which takes the arguments
> as-is, ``from_loader()`` calculates missing values from the ones passed
> in, as much as possible.  This replaces the behavior that is currently
> provided the several ``importlib.util`` functions as well as the
>

"provided by several"

>  optional ``init_module_attrs()`` method of loaders.  Just to be clear,
> here is a more detailed description of those calculations::
>
>    If not passed in, ``filename`` is to the result of calling the
>    loader's ``get_filename()``, if available.  Otherwise it stays
>    unset (``None``).
>
>    If not passed in, ``path`` is set to an empty list if
>    ``is_package`` is true.  Then the directory from ``filename`` is
>    appended to it, if possible.  If ``is_package`` is false, ``path``
>    stays unset.
>
>    If ``cached`` is not passed in and ``filename`` is passed in,
>    ``cached`` is derived from it.  For filenames with a source suffix,
>    it set to the result of calling
>    ``importlib.util.cache_from_source()``.  For bytecode suffixes (e.g.
>    ``.pyc``), ``cached`` is set to the value of ``filename``.  If
>    ``filename`` is not passed in or ``cache_from_source()`` raises
>    ``NotImplementedError``, ``cached`` stays unset.
>
>    If not passed in, ``origin`` is set to ``filename``.  Thus if
>    ``filename`` is unset, ``origin`` stays unset.
>

Why is this a static constructor instead of a method like infer_values() or
an infer_values keyword-only argument to the constructor to do this if
requested?

>
> ``module_repr()``
>
> Returns a repr string for the module if ``origin`` is set and
> ``filename`` is not set.  The string refers to the value of ``origin``.
> Otherwise ``module_repr()`` returns None.  This indicates to the module
> type's ``__repr__()`` that it should fall back to the default repr.
>

This makes me think that origin is an odd name if all it affects is
module_repr().

>
> We could also have ``module_repr()`` produce the repr for the case where
> ``filename`` is set or where ``origin`` is not set, mirroring the repr
> that the module type produces directly.  However, the repr string is
> derived from the import-related module attributes, which might be out of
> sync with the spec.
>

 [SNIP]

> .. XXX add reload(module=None) and drop load()'s parameters entirely?
>

If you are going to make these semantics of making the module argument only
good for reloading then I say yes, make it a separate method.

> .. XXX add more of importlib.reload()'s boilerplate to load()/reload()?
>
> Backward Compatibility
> ----------------------
>
> Since ``Finder.find_module()`` methods would now return a module spec
> instead of loader, specs must act like the loader that would have been
> returned instead.  This is relatively simple to solve since the loader
> is available as an attribute of the spec.  We will use ``__getattr__()``
> to do it.
>
> However, ``ModuleSpec.is_package`` (an attribute) conflicts with
> ``InspectLoader.is_package()`` (a method).  Working around this requires
> a more complicated solution but is not a large obstacle.  Simply making
> ``ModuleSpec.is_package`` a method does not reflect that is a relatively
> static piece of data.
>

Maybe, but depending on what your "more complicated solution" it it might
be best to just give up the purity and go with the practicality.

>  ``module_repr()`` also conflicts with the same
> method on loaders, but that workaround is not complicated since both are
> methods.
>
> Unfortunately, the ability to proxy does not extend to ``id()``
> comparisons and ``isinstance()`` tests.  In the case of the return value
> of ``find_module()``, we accept that break in backward compatibility.
> However, we will mitigate the problem with ``isinstance()`` somewhat by
> registering ``ModuleSpec`` on the loaders in ``importlib.abc``.
>

Actually, ModuleSpec doesn't even need to register; __instancecheck__ and
__subclasscheck__ can just be defined and delegate by calling
issubclass/isinstance on the loader as appropriate.

> [SNIP]
>
> Loaders
> -------
>
> Loaders will have a new method, ``exec_module(module)``.  Its only job
> is to "exec" the module and consequently populate the module's
> namespace.  It is not responsible for creating or preparing the module
> object, nor for any cleanup afterward.  It has no return value.
>

> The ``load_module()`` of loaders will still work and be an active part
> of the loader API.  It is still useful for cases where the default
> module creation/prepartion/cleanup is not appropriate for the loader.
>
> For example, the C API for extension modules only supports the full
> control of ``load_module()``.  As such, ``ExtensionFileLoader`` will not
> implement ``exec_module()``.  In the future it may be appropriate to
> produce a second C API that would support an ``exec_module()``
> implementation for ``ExtensionFileLoader``.  Such a change is outside
> the scope of this PEP.
>
> A loader must have at least one of ``exec_module()`` and
> ``load_module()`` defined.
>

"A load must define either ``exec_module()`` or ``load_module()``."

-Brett

[SNIP]
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130811/c5c78c20/attachment-0001.html>