[Import-SIG] PEP 451 (ModuleSpec) round 3

Brett Cannon brett at python.org
Wed Aug 28 19:22:59 CEST 2013


On Wed, Aug 28, 2013 at 4:50 AM, Eric Snow <ericsnowcurrently at gmail.com>wrote:

> I've incorporated the feedback into the PEP and gave up on trying to
> re-purpose Finder.find_module() (which wasn't worth it).  Let me know what
> you think.  I'll have the implementation up on
> http://bugs.python.org/issue18864 in the next couple days.
>
> -eric
>
>
> ----------------------------------------------------------------------------------------
>
> PEP: 451
> Title: A ModuleSpec Type for the Import System
> Version: $Revision$
> Last-Modified: $Date$
> Author: Eric Snow <ericsnowcurrently at gmail.com>
> Discussions-To: import-sig at python.org
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 8-Aug-2013
> Python-Version: 3.4
> Post-History: 8-Aug-2013
>               28-Aug-2013
> Resolution:
>
>
> Abstract
> ========
>
> This PEP proposes to add a new class to ``importlib.machinery`` called
> ``ModuleSpec``.  It will be authoritative for all the import-related
> information about a module, and will be available without needing to
> load the module first.  Finders will provide a module's spec instead of
> a loader.
>

Don't you mean finders will return a ModuleSpec? Since 'loader' is still
defined in the ModuleSpec to know what loader to use that statement that
finders don't provide a loader is misleading.


>  The import machinery will be adjusted to take advantage of
> module specs, including using them to load modules.
>
>
> Motivation
> ==========
>
> The import system has evolved over the lifetime of Python.  In late 2002
> PEP 302 introduced standardized import hooks via ``finders`` and
> ``loaders`` and ``sys.meta_path``.  The ``importlib`` module, introduced
> with Python 3.1, now exposes a pure Python implementation of the APIs
> described by PEP 302, as well as of the full import system.  It is now
> much easier to understand and extend the import system.  While a benefit
> to the Python community, this greater accessibilty also presents a
> challenge.
>
> As more developers come to understand and customize the import system,
> any weaknesses in the finder and loader APIs will be more impactful.  So
> the sooner we can address any such weaknesses the import system, the
> better...and there are a couple we can take care of with this proposal.
>
> Firstly, any time the import system needs to save information about a
> module we end up with more attributes on module objects that are
> generally only meaningful to the import system and occasionally to some
> people.
>

Leave out "and occasionally to some people"; saying "generally" implies
that some people occasionally find it useful.


>  It would be nice to have a per-module namespace to put future
> import-related information.
>

".. nice to have only a ... and pass within the import system."


>   Secondly, there's an API void between
> finders and loaders that causes undue complexity when encountered.
>
> Currently finders are strictly responsible for providing the loader
> which the import system will use to load the module.
>

"... through their ``find_module()`` method."


>   The loader is then
> responsible for doing some checks, creating the module object, setting
> import-related attributes, "installing" the module to ``sys.modules``,
> and loading the module, along with some cleanup.  This all takes place
> during the import system's call to ``Loader.load_module()``.  Loaders
> also provide some APIs for accessing data associated with a module.
>
> Loaders are not required to provide any of the functionality of
> ``load_module()`` through other methods.  Thus, though the import-
> related information about a module is likely available without loading
> the module, it is not otherwise exposed.
>
> Furthermore, the requirements assocated with ``load_module()`` are
> common to all loaders and mostly are implemented in exactly the same
> way.  This means every loader has to duplicate the same boilerplate
> code.  ``importlib.util`` provides some tools that help with this, but
> it would be more helpful if the import system simply took charge of
> these responsibilities.  The trouble is that this would limit the degree
> of customization that ``load_module()`` facilitates.  This is a gap
> between finders and loaders which this proposal aims to fill.
>
> Finally, when the import system calls a finder's ``find_module()``, the
> finder makes use of a variety of information about the module that is
> useful outside the context of the method.  Currently the options are
> limited for persisting that per-module information past the method call,
> since it only returns the loader.  Popular options for this limitation
> are to store the information in a module-to-info mapping somewhere on
> the finder itself, or store it on the loader.
>
> Unfortunately, loaders are not required to be module-specific.  On top
> of that, some of the useful information finders could provide is
> common to all finders, so ideally the import system could take care of
> that.
>

"that" -> "those details"


>   This is the same gap as before between finders and loaders.
>
> As an example of complexity attributable to this flaw, the
> implementation of namespace packages in Python 3.3 (see PEP 420) added
> ``FileFinder.find_loader()`` because there was no good way for
> ``find_module()`` to provide the namespace search locations.
>
> The answer to this gap is a ``ModuleSpec`` object that contains the
> per-module information and takes care of the boilerplate functionality
> of loading the module.
>

"of loading the module" -> "involved with loading the module".


>
> (The idea gained momentum during discussions related to another PEP.[1])
>
>
> Specification
> =============
>
> The goal is to address the gap between finders and loaders while
> changing as little of their semantics as possible.  Though some
> functionality and information is moved to the new ``ModuleSpec`` type,
> their behavior should remain the same.  However, for the sake of clarity
> the finder and loader semantics will be explicitly identified.
>
> This is a high-level summary of the changes described by this PEP.  More
> detail is available in later sections.
>
> importlib.machinery.ModuleSpec (new)
> ------------------------------------
>

For this entire section you need to provide the call signatures as you
start talking semantics later w/o making clear what is being passed and
returned before going into detail of the individual methods. Otherwise move
the detailed discussion of the methods up to before the semantics overview.


>
> Attributes:
>
> * name - a string for the name of the module.
> * loader - the loader to use for loading and for module data.
> * origin - a string for the location from which the module is loaded.
>

I would give an "e.g." here to help explain what you mean. As previous
comments have shown, the name alone is not enough to understand what value
should go here. =)


> * submodule_search_locations - strings for where to find submodules,
>   if a package.
> * loading_info - a container of data for use during loading (or None).
> * cached (property) - a string for where the compiled module will be
>   stored.
> * is_location (RO-property) - the module's origin refers to a location.
>
> .. XXX Find a better name than loading_info?
>

loading_data is all that I can think of


> .. XXX Add ``submodules`` (RO-property) - returns possible submodules
>    relative to spec (or None)?
>

Actual use-case or are you just guessing there will be a use? Don't add any
fields that we have not seen an actual need for.


> .. XXX Add ``loaded`` (RO-property) - the module in sys.modules, if any?
>

Too easy to figure out with ``name in sys.modules`` and can go stale
(unless you make this a property).


>
> Factory Methods:
>
> * from_file_location() - factory for file-based module specs.
> * from_module() - factory based on import-related module attributes.
> * from_loader() - factory based on information provided by loaders.
>
> .. XXX Move the factories to importlib.util or make class-only?
>

> Instance Methods:
>
> * init_module_attrs() - populate a module's import-related attributes.
> * module_repr() - provide a repr string for a module.
> * create() - provide a new module to use for loading.
> * exec() - execute the spec into a module namespace.
> * load() - prepare a module and execute it in a protected way.
> * reload() - re-execute a module in a protected way.
>
> .. XXX Make module_repr() match the spec (BC problem?)?
>
> API Additions
> -------------
>
> * ``importlib.abc.Loader.exec_module()`` will execute a module in its
>   own namespace, replacing ``importlib.abc.Loader.load_module()``.
> * ``importlib.abc.Loader.create_module()`` (optional) will return a new
>   module to use for loading.
> * Module objects will have a new attribute: ``__spec__``.
> * ``importlib.find_spec()`` will return the spec for a module.
> * ``__subclasshook__()`` will be implemented on the importlib ABCs.
>
> .. XXX Do __subclasshook__() separately from the PEP (issue18862).
>
> API Changes
> -----------
>
> * Import-related module attributes will no longer be authoritative nor
>   used by the import system.
> * ``InspectLoader.is_package()`` will become optional.
>
> .. XXX module __repr__() will prefer spec attributes?
>
> Deprecations
> ------------
>
> * ``importlib.abc.MetaPathFinder.find_module()``
> * ``importlib.abc.PathEntryFinder.find_module()``
> * ``importlib.abc.PathEntryFinder.find_loader()``
> * ``importlib.abc.Loader.load_module()``
> * ``importlib.abc.Loader.module_repr()``
> * The parameters and attributes of the various loaders in
>   ``importlib.machinery``
> * ``importlib.util.set_package()``
> * ``importlib.util.set_loader()``
> * ``importlib.find_loader()``
>
> Removals
> --------
>
> * ``importlib.abc.Loader.init_module_attrs()``
> * ``importlib.util.module_to_load()``
>
> Other Changes
> -------------
>
> * The spec for the ``__main__`` module will reflect the appropriate
>   name and origin.
> * The module type's ``__repr__`` will defer to ModuleSpec exclusively.
>
> Backward-Compatibility
> ----------------------
>
> * If a finder does not define ``find_spec()``, a spec is derived from
>   the loader returned by ``find_module()``.
> * ``PathEntryFinder.find_loader()`` will be used, if defined.
> * ``Loader.load_module()`` is used if ``exec_module()`` is not defined.
> * ``Loader.module_repr()`` is used by ``ModuleSpec.module_repr()`` if it
>   exists.
>
> What Will not Change?
> ---------------------
>
> * The syntax and semantics of the import statement.
> * Existing finders and loaders will continue to work normally.
> * The import-related module attributes will still be initialized with
>   the same information.
> * Finders will still create loaders, storing them in the specs.
> * ``Loader.load_module()``, if a module defines it, will have all the
>   same requirements and may still be called directly.
> * Loaders will still be responsible for module data APIs.
>
>
> ModuleSpec Users
> ================
>
> ``ModuleSpec`` objects has 3 distinct target audiences: Python itself,
> import hooks, and normal Python users.
>
> Python will use specs in the import machinery, in interpreter startup,
> and in various standard library modules.  Some modules are
> import-oriented, like pkgutil, and others are not, like pickle and
> pydoc.  In all cases, the full ``ModuleSpec`` API will get used.
>
> Import hooks (finders and loaders) will make use of the spec in specific
> ways, mostly without using the ``ModuleSpec`` instance methods.  First
> of all, finders will use the factory methods to create spec objects.
> They may also directly adjust the spec attributes after the spec is
> created.  Secondly, the finder may bind additional information to the
> spec for the loader to consume during module creation/execution.
> Finally, loaders will make use of the attributes on a spec when creating
> and/or executing a module.
>
> Python users will be able to inspect a module's ``__spec__`` to get
> import-related information about the object.  Generally, they will not
> be using the ``ModuleSpec`` factory methods nor the instance methods.
>

As of right now no one is using the instance methods based on the wording
in this section. =)


> However, each spec has methods named ``create``, ``exec``, ``load``, and
> ``reload``.  Since they are so easy to access (and misunderstand/abuse),
> their function and availability require explicit consideration in this
> proposal.
>
>
> What Will Existing Finders and Loaders Have to Do Differently?
> ==============================================================
>
> Immediately?  Nothing.  The status quo will be deprecated, but will
> continue working.  However, here are the things that the authors of
> finders and loaders should change relative to this PEP:
>
> * Implement ``find_spec()`` on finders.
> * Implement ``exec_module()`` on loaders, if possible.
>
> The factory methods of ``ModuleSpec`` are intended to be helpful for
> converting existing finders.  ``from_loader()`` and
> ``from_file_location()`` are both straight-forward utilities in this
> regard.
>

If this holds to be true then they should go into importlib.util and kept
out of the general object since dir(module_spec) shouldn't need to show the
methods indefinitely.


>  In the case where loaders already expose methods for creating
> and preparing modules, a finder may use ``ModuleSpec.from_module()`` on
> a throw-away module to create the appropriate spec.
>

Why is the module a throw-away one? And why would loaders need to construct
a ModuleSpec?


>
> As for loaders,
>

You were just talking about loader, so this is a bad transition.


>  ``exec_module()`` should be a relatively direct
> conversion from a portion of the existing ``load_module()``.  However,
> ``Loader.create_module()`` will also be necessary in some uncommon
> cases.  Furthermore, ``load_module()`` will still work as a final option
> when ``exec_module()`` is not appropriate.
>
>
> How Loading Will Work
> =====================
>
> This is an outline of what happens in ``ModuleSpec.load()``.
>
> 1. A new module is created by calling ``spec.create()``.
>
>    a. If the loader has a ``create_module()`` method, it gets called.
>       Otherwise a new module gets created.
>    b. The import-related module attributes are set.
>

So it seems step (b) happens even if step (a) does. If that's the case then
are attributes overridden blindly, or conditionally set? If (b) doesn't
happen if (a) did then you need to make that clear.


>
> 2. The module is added to sys.modules.
>

I would add a note that there is a separate method for handling reloads and
thus blindly setting sys.modules is acceptable.


> 3. ``spec.exec(module)`` gets called.
>
>    a. If the loader has an ``exec_module()`` method, it gets called.
>       Otherwise ``load_module()`` gets called for backward-compatibility
>       and the resulting module is updated to match the spec.
>

"resulting module found in sys.modules is".

And I think you meant to make step (b) be the fallback to load_module().


>
> 4. If there were any errors the module is removed from sys.modules.
> 5. If the module was replaced in sys.modules during ``exec()``, the one
>    in sys.modules is updated to match the spec.
>

This doesn't make sense. You just said the module got updated to match the
spec in step 3.a. Are you saying you're going to overwrite values that
exec_module() set? And once again, blindly updating or conditionally? And
how are these attributes being set? Since exec_module() is going to need to
set these anyway for proper exec() use during loading then why are you
setting them *again* later on? Should you set these first and then let the
methods reset them as they see fit? I thought exec_module() took in a
filled-in module anyway, so didn't you have to set all the attributes prior
to passing it in anyway in step 1.a? In that case this is a reset which
seems wrong if code explicitly chose to change the values.


> 6. The module in sys.modules is returned.
>

Or you can just provide the pseudo-code and skip all of this explanation
and be easier to follow =) You can leave comments with step numbers if you
want to expound upon any specific step outside of the pseudo-code:

class ModuleSpec:

  def load(self):
    module = self.create()
    sys.modules[self.name] = module

    try:
      self.exec(module)
    except:
      try:
        del sys.modules[self.name]
      except KeyError:
        pass
    else:
      # XXX different from proposal: didn't reset attributes
      return sys.modules[self.name]

  def create(self):
    if hasattr(self.loader, 'create_module'):
      module = self.loader.create_module(self)
    else:
      module = types.ModuleType(self.name)
      # XXX different from proposal: didn't do it blindly after
create_module()
      self.init_module_attrs(module)
    return module

  def exec(self, module):
    if hasattr(self.loader, 'exec_module'):
      self.loader.exec_module(module)
    elif hasattr(self.loader, 'load_module'):
      self.loader.load_module(self.name)
      module = sys.modules[self.name]
    else:
      raise TypeError('{!r} loader does not have an ' +
                      'exec_module or load_module
method'.format(self.loader))
    return module




>
> These steps are exactly what ``Loader.load_module()`` is already
> expected to do.  Loaders will thus be simplified since they will only
> need to implement the portion in step 3a.
>
>
> ModuleSpec
> ==========
>
> This is a new class which defines the import-related values to use when
> loading the module.  It closely corresponds to the import-related
> attributes of module objects.  ``ModuleSpec`` objects may also be used
> by finders and loaders and other import-related APIs to hold extra
> import-related state concerning the module.  This greatly reduces the
> need to add any new new import-related attributes to module objects, and
> loader ``__init__`` methods will no longer need to accommodate such
> per-module state.
>
> General Notes
> -------------
>
> * The spec for each module instance will be unique to that instance even
>   if the information is identical to that of another spec.
> * A module's spec is not intended to be modified by anything but
>   finders.
>
> Creating a ModuleSpec
> ---------------------
>
> **ModuleSpec(name, loader, *, origin=None, is_package=None)**
>
> .. container::
>
>    ``name``, ``loader``, and ``origin`` are set on the new instance
>    without any modification.  If ``is_package`` is not passed in, the
>    loader's ``is_package()`` gets called (if available), or it defaults
>    to `False`.  If ``is_package`` is true,
>    ``submodule_search_locations`` is set to a new empty list.  Otherwise
>    it is set to None.
>
>    Other attributes not listed as parameters (such as ``package``) are
>    either read-only dynamic properties or default to None.
>
> **from_filename(name, loader, *, filename=None,
> submodule_search_locations=None)**
>
> .. container::
>
>    This factory classmethod allows a suitable ModuleSpec instance to be
>    easily created with extra file-related information.  This includes
>    the values that would be set on a module as ``__file__`` or
>    ``__cached__``.
>
>    ``is_location`` is set to True for specs created using
>    ``from_filename()``.
>
> **from_module(module, loader=None)**
>
> .. container::
>
>    This factory is used to create a spec based on the import-related
>    attributes of an existing module.  Since modules should already have
>    ``__spec__`` set, this method has limited utility.
>

"this method is expect to only be used in backwards-compatibility
situations."


>
> **from_loader(name, loader, *, origin=None, is_package=None)**
>
> .. container::
>
>    A factory classmethod that returns a new ``ModuleSpec`` derived from
>    the arguments.  ``is_package`` is used inside the method to indicate
>    that the module is a package.  If not explicitly passed in, it falls
>    back to using the result of the loader's ``is_package()``, if
>    available.  If not available, if defaults to False.
>
>    In contrast to ``ModuleSpec.__init__()``, which takes the arguments
>    as-is, ``from_loader()`` calculates missing values from the ones
>    passed in, as much as possible.  This replaces the behavior that is
>    currently provided by several ``importlib.util`` functions as well as
>    the optional ``init_module_attrs()`` method of loaders.
>

"optional (and proposed-to-be-deprecated)"


>  Just to be
>    clear, here is a more detailed description of those calculations::
>
>       If not passed in, ``filename`` is to the result of calling the
>       loader's ``get_filename()``, if available.  Otherwise it stays
>       unset (``None``).
>
>        If not passed in, ``submodule_search_locations`` is set to an empty
>       list if ``is_package`` is true.  Then the directory from ``filename``
>       is appended to it, if possible.  If ``is_package`` is false,
>       ``submodule_search_locations`` stays unset.
>
>       If ``cached`` is not passed in and ``filename`` is passed in,
>       ``cached`` is derived from it.  For filenames with a source suffix,
>       it set to the result of calling
>       ``importlib.util.cache_from_source()``.  For bytecode suffixes (e.g.
>       ``.pyc``), ``cached`` is set to the value of ``filename``.  If
>       ``filename`` is not passed in or ``cache_from_source()`` raises
>       ``NotImplementedError``, ``cached`` stays unset.
>
>       If not passed in, ``origin`` is set to ``filename``.  Thus if
>       ``filename`` is unset, ``origin`` stays unset.
>
>
> Attributes
> ----------
>
> Each of the following names is an attribute on ``ModuleSpec`` objects.
> A value of ``None`` indicates "not set".  This contrasts with module
> objects where the attribute simply doesn't exist.
>
> While ``package`` is a read-only property, the remaining attributes can
> be replaced after the module spec is created and even after import is
> complete.  This allows for unusual cases where directly modifying the
> spec is the best option.  However, typical use should not involve
> changing the state of a module's spec.
>
> Most of the attributes correspond to the import-related attributes of
> modules.  Here is the mapping, followed by a description of the
> attributes.  The reverse of this mapping is used by
> ``ModuleSpec.init_module_attrs()``.
>
> ========================== ===========
> On ModuleSpec              On Modules
> ========================== ===========
> name                       __name__
> loader                     __loader__
> package                    __package__
> origin                     __file__*
> cached                     __cached__*
>

This shouldn't be set on extension modules, so this is another asterisk of
has_location *and* is not None (right?).


> submodule_search_locations __path__**
> loading_info                \-
> has_location (RO-property)  \-
> ========================== ===========
>
> \* Only if ``is_location`` is true.
>

Should that be has_location?


> \*\* Only if not None.
>

"Set only if not None"


>
> **name**
>
> .. container::
>
>    The module's fully resolved and absolute name.  It must be set.
>
> **loader**
>
> .. container::
>
>    The loader to use during loading and for module data.  These specific
>    functionalities do not change for loaders.  Finders are still
>    responsible for creating the loader and this attribute is where it is
>    stored.  The loader must be set.
>
> **origin**
>
> .. container::
>
>    A string for the location from which the module originates.  Aside from
>    the informational value, it is also used in ``module_repr()``.
>
>    The module attribute ``__file__`` has a similar but more restricted
>    meaning.  Not all modules have it set (e.g. built-in modules).  However,
>    ``origin`` is applicable to essentially all modules.  For built-in
>    modules it would be set to "built-in".
>
> Secondary Attributes
> --------------------
>
> Some of the ``ModuleSpec`` attributes are not set via arguments when
> creating a new spec.  Either they are strictly dynamically calculated
> properties or they are simply set to None (aka "not set").  For the
> latter case, those attributes may still be set directly.
>
> **package**
>
> .. container::
>
>    A dynamic property that gives the name of the module's parent.  The
>    value is derived from ``name`` and ``is_package``.  For packages it is
>    the value of ``name``.  Otherwise it is equivalent to
>    ``name.rpartition('.')[0]``.  Consequently, a top-level module will have
>    the empty string for ``package``.
>
> **has_location**
>
> .. container::
>
>    Some modules can be loaded by reference to a location, e.g. a filesystem
>    path or a URL or something of the sort.  Having the location lets you
>    load the module, but in theory you could load that module under various
>    names.
>
>    In contrast, non-located modules can't be loaded in this fashion, e.g.
>    builtin modules and modules dynamically created in code.  For these, the
>    name is the only way to access them, so they have an "origin" but not a
>    "location".
>
>    This attribute reflects whether or not the module is locatable.  If it
>    is, ``origin`` must be set to the module's location and ``__file__``
>    will be set on the module.  Furthermore, a locatable module is also
>    cacheable and so ``__cached__`` is tied to ``has_location``.
>

That statement about __cached__ is not true for extension modules. You're
going to need to tweak how you define 'cached' based on this. Either that
or you can try to use this as a justification for loader.create_module() as
you can override these semantics there as a pure Python module is more
common than extension modules (although this doesn't help with the
ModuleSpec having the wrong information when returned from the finder
unless the finder itself resets it on the ModuleSpec before returning it).


>
>    The corresponding module attribute name, ``__file__``, is somewhat
>    inaccurate and potentially confusion, so we will use a more explicit
>    combination of ``origin`` and ``has_location`` to represent the same
>    information.  Having a separate ``filename`` is unncessary since we have
>    ``origin``.
>
> **cached**
>
> .. container::
>
>    A string for the location where the compiled code for a module should be
>    stored.  PEP 3147 details the caching mechanism of the import system.
>
>    If ``has_location`` is true, this location string is set on the module
>    as ``__cached__``.  When ``from_filename()`` is used to create a spec,
>    ``cached`` is set to the result of calling
>    ``importlib.util.source_to_cache()``.
>
>    ``cached`` is not necessarily a file location.  A finder or loader may
>    store an alternate location string in ``cached``.  However, in practice
>    this will be the file location dicated by PEP 3147.
>
> **submodule_search_locations**
>
> .. container::
>
>    The list of location strings, typically directory paths, in which to
>    search for submodules.  If the module is a package this will be set to
>    a list (even an empty one).  Otherwise it is ``None``.
>
>    The corresponding module attribute's name, ``__path__``, is relatively
>    ambiguous.  Instead of mirroring it, we use a more explicit name that
>    makes the purpose clear.
>
> **loading_info**
>
> .. container::
>
>    A finder may set ``loading_info`` to any value to provide additional
>    data for the loader to use during loading.  A value of ``None`` is the
>    default and indicates that there is no additional data.  Otherwise it is
>    likely set to some containers, such as a ``dict``, ``list``, or
>

"Otherwise it can be set to any object."


>    ``types.SimpleNamespace`` containing the relevant extra information.
>
>    For example, ``zipimporter`` could use it to pass the zip archive name
>    to the loader directly, rather than needing to derive it from ``origin``
>    or create a custom loader for each find operation.
>
> Methods
> -------
>
> **module_repr()**
>
> .. container::
>
>    Returns a repr string for the module, based on the module's import-
>    related attributes and falling back to the spec's attributes.  The
>    string will reflect the current output of the module type's
>    ``__repr__()``.
>
>    The module type's ``__repr__()`` will use the module's ``__spec__``
>    exclusively.  If the module does not have ``__spec__`` set, a spec is
>    generated using ``ModuleSpec.from_module()``.
>
>    Since the module attributes may be out of sync with the spec and to
>    preserve backward-compatibility in that case, we defer to the module
>    attributes and only when they are missing do we fall back to the spec
>    attributes.
>
> **init_module_attrs(module)**
>
> .. container::
>
>    Sets the module's import-related attributes to the corresponding values
>    in the module spec.  If ``has_location`` is false on the spec,
>    ``__file__`` and ``__cached__`` are not set on the module.  ``__path__``
>    is only set on the module if ``submodule_search_locations`` is None.
>    For the rest of the import-related module attributes, a ``None`` value
>    on the spec (aka "not set") means ``None`` will be set on the module.
>    If any of the attributes are already set on the module, the existing
>    values are replaced.  The module's own ``__spec__`` is not consulted but
>    does get replaced with the spec on which ``init_module_attrs()`` was
>    called.  The earlier mapping of ``ModuleSpec`` attributes to module
>    attributes indicates which attributes are involved on both sides.
>
> **create()**
>
> .. container::
>
>    A new module is created relative to the spec and its import-related
>    attributes are set accordingly.  If the spec's loader has a
>    ``create_module()`` method, that gets called to create the module.  This
>    give the loader a chance to do any pre-loading initialization that can't
>    otherwise be accomplished elsewhere.  Otherwise a bare module object is
>    created.  In both cases ``init_module_attrs()`` is called on the module
>    before it gets returned.
>

As stated earlier, I don't like the idea of blindly resetting attributes if
set by create_module().


>
> **exec(module)**
>
> .. container::
>
>    The spec's loader is used to execute the module.  If the loader has
>    ``exec_module()`` defined, the namespace of ``module`` is the target of
>    execution.
>

Wait, what? You suggest it's the module in the signature but
module.__dict__ in the explanation.


>  Otherwise the loader's ``load_module()`` is called, which
>    ignores ``module`` and returns the module that was the actual
>    execution target.
>

Are you pulling from sys.modules? Otherwise how are you getting the module
 from load_module()? And you don't mention that in one case the module is
not put into sys.modules while in the other case it is (exec_module vs.
load_module). That dichotomy is going to be messy. Does this need to be
separate from load()? If you merge it in then the sys.modules semantics are
unified within load(). Otherwise you need to make this set sys.modules in
either case and return from sys.modules.


>  In that case the import-related attributes of that
>    module are updated to reflect the spec.
>

Why? If you already set the attributes in the module and inserted it into
sys.modules previously then you already took care of this. Else you now are
setting the attributes potentially *three* times (twice in create() from
loader.create_module() + an explicit call to init_module_attr() and then
here).


>  In both cases the targeted
>    module is the one that gets returned.
>

Huh? What exactly are you returning? You say "actual execution target"
above for load_module() but "in both cases the target module" here. That
seems to contradictory.


>
> **load()**
>
> .. container::
>
>    This method captures the current functionality of and requirements on
>    ``Loader.load_module()`` without any semantic changes.  It is
>    essentially a wrapper around ``create()`` and ``exec()`` with some
>    extra functionality regarding ``sys.modules``.
>
>    itself in ``sys.modules`` while executing.  Consequently, the module in
>    ``sys.modules`` is the one that gets returned by ``load()``.
>
>    Right before ``exec()`` is called, the module is added to
>    ``sys.modules``.  In the case of error during loading the module is
>    removed from ``sys.modules``.  The module in ``sys.modules`` when
>    ``load()`` finishes is the one that gets returned.  Returning the module
>    from ``sys.modules`` accommodates the ability of the module to replace
>    itself there while it is executing (during load).
>
>    As already noted, this is what already happens in the import system.
>    ``load()`` is not meant to change any of this behavior.
>
>    If ``loader`` is not set (``None``), ``load()`` raises a ValueError.
>

Since the loader is required by the initializer for ModuleSpec I don't know
if this specific check is necessary: EAFP.


>
> **reload(module)**
>
> .. container::
>
>    As with ``load()`` this method faithfully fulfills the semantics of
>    ``Loader.load_module()`` in the reload case, with one exception:
>    reloading a module when ``exec_module()`` is available actually uses
>    ``module`` rather than ignoring it in favor of the one in
>    ``sys.modules``, as ``Loader.load_module()`` does.  The functionality
>    here mirrors that of ``load()``, minus the ``create()`` call and the
>    ``sys.modules`` handling.
>
> .. XXX add more of importlib.reload()'s boilerplate to reload()?
>
> Omitted Attributes and Methods
> ------------------------------
>
> There is no ``PathModuleSpec`` subclass of ``ModuleSpec`` that provides
> the ``has_location``, ``cached``, and ``submodule_search_locations``
> functionality.  While that might make the separation cleaner, module
> objects don't have that distinction.  ``ModuleSpec`` will support both
> cases equally well.
>
> While ``is_package`` would be a simple additional attribute (aliasing
> ``self.submodule_search_locations is not None``), it perpetuates the
> artificial (and mostly erroneous) distinction between modules and
> packages.
>
> Conceivably, ``ModuleSpec.load()`` could optionally take a list of
> modules with which to interact instead of ``sys.modules``.  That
> capability is left out of this PEP, but may be pursued separately at
> some other time, including relative to PEP 406 (import engine).
>
> Likewise ``load()`` could be leveraged to implement multi-version
> imports.  While interesting, doing so is outside the scope of this
> proposal.
>
> Backward Compatibility
> ----------------------
>
> ``ModuleSpec`` doesn't have any.  This would be a different story if
> ``Finder.find_module()`` were to return a module spec instead of loader.
> In that case, specs would have to act like the loader that would have
> been returned instead.  Doing so would be relatively simple, but is an
> unnecessary complication.
>
> Subclassing
> -----------
>
> Subclasses of ModuleSpec are allowed, but should not be necessary.
> Simply setting ``loading_info`` or adding functionality to a custom
> finder or loader will likely be a better fit and should be tried first.
> However, as long as a subclass still fulfills the requirements of the
> import system, objects of that type are completely fine as the return
> value of ``Finder.find_spec()``.
>
>
> Existing Types
> ==============
>
> Module Objects
> --------------
>
> **__spec__**
>
> .. container::
>
>    Module objects will now have a ``__spec__`` attribute to which the
>    module's spec will be bound.
>
> None of the other import-related module attributes will be changed or
> deprecated, though some of them could be; any such deprecation can wait
> until Python 4.
>
> ``ModuleSpec`` objects will not be kept in sync with the corresponding
> module object's import-related attributes.  Though they may differ, in
> practice they will typically be the same.
>
> One notable exception is that case where a module is run as a script by
> using the ``-m`` flag.  In that case ``module.__spec__.name`` will
> reflect the actual module name while ``module.__name__`` will be
> ``__main__``.
>
> Finders
> -------
>
> **MetaPathFinder.find_spec(name, path=None)**
>
> **PathEntryFinder.find_spec(name)**
>
> .. container::
>
>    Finders will return ModuleSpec objects when ``find_spec()`` is
>    called.  This new method replaces ``find_module()`` and
>    ``find_loader()`` (in the ``PathEntryFinder`` case).  If a loader does
>    not have ``find_spec()``, ``find_module()`` and ``find_loader()`` are
>    used instead, for backward-compatibility.
>
>    Adding yet another similar method to loaders is a case of practicality.
>    ``find_module()`` could be changed to return specs instead of loaders.
>    This is tempting because the import APIs have suffered enough,
>    especially considering ``PathEntryFinder.find_loader()`` was just
>    added in Python 3.3.  However, the extra complexity and a less-than-
>    explicit method name aren't worth it.
>
> Finders are still responsible for creating the loader.  That loader will
> now be stored in the module spec returned by ``find_spec()`` rather
> than returned directly.  As is currently the case without the PEP, if a
> loader would be costly to create, that loader can be designed to defer
> the cost until later.
>
> Loaders
> -------
>
> **Loader.exec_module(module)**
>
> .. container::
>
>    Loaders will have a new method, ``exec_module()``.  Its only job
>    is to "exec" the module and consequently populate the module's
>    namespace.  It is not responsible for creating or preparing the module
>    object, nor for any cleanup afterward.  It has no return value.
>
> **Loader.load_module(fullname)**
>
> .. container::
>
>    The ``load_module()`` of loaders will still work and be an active part
>    of the loader API.  It is still useful for cases where the default
>    module creation/prepartion/cleanup is not appropriate for the loader.
>    If implemented, ``load_module()`` will still be responsible for its
>    current requirements (prep/exec/etc.) since the method may be called
>    directly.
>
>    For example, the C API for extension modules only supports the full
>    control of ``load_module()``.  As such, ``ExtensionFileLoader`` will not
>    implement ``exec_module()``.  In the future it may be appropriate to
>    produce a second C API that would support an ``exec_module()``
>    implementation for ``ExtensionFileLoader``.  Such a change is outside
>    the scope of this PEP.
>
> A loader must define either ``exec_module()`` or ``load_module()``.  If
> both exist on the loader, ``ModuleSpec.load()`` uses ``exec_module()``
> and ignores ``load_module()``.
>
> **Loader.create_module(spec)**
>
> .. container::
>
>    Loaders may also implement ``create_module()`` that will return a
>    new module to exec.  However, most loaders will not need to implement
>    the method.
>
> PEP 420 introduced the optional ``module_repr()`` loader method to limit
> the amount of special-casing in the module type's ``__repr__()``.  Since
> this method is part of ``ModuleSpec``, it will be deprecated on loaders.
> However, if it exists on a loader it will be used exclusively.
>
> ``Loader.init_module_attr()`` method, added prior to Python 3.4's
> release , will be removed in favor of the same method on ``ModuleSpec``.
>
> However, ``InspectLoader.is_package()`` will not be deprecated even
> though the same information is found on ``ModuleSpec``.  ``ModuleSpec``
> can use it to populate its own ``is_package`` if that information is
> not otherwise available.  Still, it will be made optional.
>
> The path-based loaders in ``importlib`` take arguments in their
> ``__init__()`` and have corresponding attributes.  However, the need for
> those values is eliminated by module specs.  The only exception is
> ``FileLoader.get_filename()``, which uses ``self.path``.  The signatures
> for these loaders and the accompanying attributes will be deprecated.
>
> In addition to executing a module during loading, loaders will still be
> directly responsible for providing APIs concerning module-related data.
>
>
> Other Changes
> =============
>
> * The various finders and loaders provided by ``importlib`` will be
>   updated to comply with this proposal.
> * The spec for the ``__main__`` module will reflect how the interpreter
>   was started.  For instance, with ``-m`` the spec's name will be that
>   of the run module, while ``__main__.__name__`` will still be
>   "__main__".
> * We add ``importlib.find_spec()`` to mirror
>   ``importlib.find_loader()`` (which becomes deprecated).
> * Deprecations in ``importlib.util``: ``set_package()``,
>   ``set_loader()``, and ``module_for_loader()``.  ``module_to_load()``
>   (introduced prior to Python 3.4's release) can be removed.
> * ``importlib.reload()`` is changed to use ``ModuleSpec.load()``.
> * ``ModuleSpec.load()`` and ``importlib.reload()`` will now make use of
>   the per-module import lock, whereas ``Loader.load_module()`` did not.
>
>
> Reference Implementation
> ========================
>
> A reference implementation will be available at
> http://bugs.python.org/issue18864.
>
>
> Open Issues
> ==============
>
> \* The impact of this change on pkgutil (and setuptools) needs looking
> into.  It has some generic function-based extensions to PEP 302.  These
> may break if importlib starts wrapping loaders without the tools'
> knowledge.
>
> \* Other modules to look at: runpy (and pythonrun.c), pickle, pydoc,
> inspect.
>
> \* Add ``ModuleSpec.data`` as a descriptor that wraps the data API of the
> spec's loader?
>

No. This starts to move this away from ModuleSpec modules being a data
storage object and more or a level of indirection around loaders.


>
> \* How to limit possible end-user confusion/abuses relative to spec
> attributes (since __spec__ will make them really accessible)?
>
>
> References
> ==========
>
> [1] http://mail.python.org/pipermail/import-sig/2013-August/000658.html
>
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
> ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>
>
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130828/b3cb6b02/attachment-0001.html>


More information about the Import-SIG mailing list