[Import-SIG] Round 2 for "A ModuleSpec Type for the Import System"

Tue Aug 13 05:35:14 CEST 2013

On Sun, Aug 11, 2013 at 7:03 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> I think this is solid enough to be worth adding to the PEPs repo now.
>

Sounds good.

>
> On 9 August 2013 18:58, Eric Snow <ericsnowcurrently at gmail.com> wrote:
> > Here's an updated version of the PEP for ModuleSpec which addresses the
> > feedback I've gotten.  Thanks for the help.  The big open question, to
> me,
> > is whether or not to have a separate reload() method.  I'll be looking
> into
> > that when I get a chance.  There's also the question of a path-based
> > subclass, but I'm currently not convinced it's worth it.
>
> One piece of feedback from me (triggered by the C extension modules
> discussion on python-dev): we should consider proposing a new "exec"
> hook for C extension modules that could be defined instead of or in
> addition to the existing PEP 3121 init hook.
>

Sounds good.  I expect you mean as a separate proposal...

> Also, to handle the extension module case, we may need to let loaders
> define an optional "create_module" method that accepts the MethodSpec
> object as an argument.

I'd considered that here, whether on the loader or on ModuleSpec.  My plan
was to hold off on that to stay focused on the rest of the changes.
 However, I'm open to adding this to the PEP.

> > A High-Level View
> > -----------------
> >
> > ...
>
> Not sure a high level view is needed, but you can fill this in if you want
> :)
>

Forgot that was in there. :)

> >
> > ModuleSpec
> > ----------
> >
> > A new class which defines the import-related values to use when loading
> > the module.  It closely corresponds to the import-related attributes of
> > module objects.  ``ModuleSpec`` objects may also be used by finders and
> > loaders and other import-related APIs to hold extra import-related
> > state about the module.  This greatly reduces the need to add any new
> > new import-related attributes to module objects, and loader ``__init__``
> > methods won't need to accommodate such per-module state.
>
> To avoid conflicts as the spec attributes evolve in the future, would
> it be worth having a "custom" field which is just an arbitrary object
> reference used to pass info from the finder to the loader without
> troubling the rest of the import system?
>

I see what you're saying, but am conflicted.  For some reason providing a
sub-namespace for that doesn't seem quite right.  However, the alternative
runs the risk of collisions later on.  Maybe we could recommend the use of
a preceding "_" for custom attributes?  I'll see if I can come up with
something.

> > The parameters have the same meaning as the attributes described below.
> > However, not all ``ModuleSpec`` attributes are also parameters.
> >  The
> > passed values are set as-is.  For calculated values use the
> > ``from_loader()`` method.
>
> This paragraph isn't particularly clear. Perhaps:
>
> "Passed in parameter values are assigned directly to the corresponding
> attributes below. Other attributes not listed as parameters (such as
> ``package``) are read-only properties that are automatically derived
> from these values.
>
> The ``ModuleSpec.from_loader()`` class method allows a suitable
> ModuleSpec instance to be easily created from a PEP 302 loader object"
>

That's much better.

>  > While ``package`` and ``is_package`` are read-only properties, the
> > remaining attributes can be replaced after the module spec is created
> > and after import is complete.  This allows for unusual cases where
> > modifying the spec is the best option.  However, typical use should not
> > involve changing the state of a module's spec.
>
> I'm with Brett that "is_package" should go, to be replaced by
> "spec.path is not None" wherever it matters. is_package() would then
> fall through to the PEP 302 loader API via __getattr__.
>

I'm considering the recommendation, but I still feel like `is_package` as
an attribute is worth having.  I see module.__spec__ as useful to more than
the import system and its hackers, and `is_package` as a value to the
broader audience that may not have learned about what __path__ means.  It's
certainly not obvious that __path__ implies a package.  Then again, a
person would have to be looking at __spec__ to see `is_package`, so maybe
it loses enough utility to be worth keeping.

> ``origin``
> >
> > A string for the location from which the module originates.  If
> > ``filename`` is set, ``origin`` should be set to the same value unless
> > some other value is more appropriate.  ``origin`` is used in
> > ``module_repr()`` if it does not match the value of ``filename``.
> >
> > Using ``filename`` for this meaning would be inaccurate, since not all
> > modules have path-based locations.  For instance, built-in modules do
> > not have ``__file__`` set.  Yet it is useful to have a descriptive
> > string indicating that it originated from the interpreter as a built-in
> > module.  So built-in modules will have ``origin`` set to ``"built-in"``.
>
> How about we *just* have origin, with a separate "set_fileattr"
> attribute to indicate "this is a discrete file, you should set
> __file__"?
>

I like that.  I'll see how it works.  There doesn't seem to be any reason
why you would have two distinct strings for origin and filename.  In fact,
that's kind of smelly.

However, I wonder if this is where a PathModuleSpec subclass would be
meaningful.  Then no flag would be necessary.

> Also, we should explicitly note that we'll still set __file__ for zip
> imports, due to backwards compatibility concerns, even though it
> doesn't correspond to a valid filesystem path.
>

Hmm.  So deprecate the use of __file__ for anything but actual file names?
 Interesting.  I was planning on just leaving the current meaning of
"location relative to a path entry".

>
> (Random thought: spec.origin + spec.cached + a cache directory setting
> in zipimport would give a potentially clean way to do extension module
> imports from zip archives)
>

That would be cool.

> > ``path``
> >
> > The list of path entries in which to search for submodules if this
> > module is a package.  Otherwise it is ``None``.
>
> Path entries don't have to correspond to filesystem locations - they
> just have to make sense to at least one path hook
> (e.g. a DB URI would be a valid path entry).
>

Right.  I didn't mean to imply that they do.

>  > .. XXX add a path-based subclass?
>
> Nope :)
>

I keep vacillating on this.

> > ModuleSpec Methods
> > ------------------
> >
> > ``from_loader(name, loader, *, is_package=None, origin=None,
> filename=None,
> > cached=None, path=None)``
> >
> > .. XXX use a different name?
>
> I'd disallow customisation on this one - if people want to customise,
> they should just query the PEP 302 APIs themselves and call the
> ModuleSpec constructor directly. The use case for this one should be
> to make it trivial to switch from "return loader" to "return
> ModuleSpec.from_loader(loader)" in a find_module implementation.
>

What do you mean by disallow customization?  Make it "private"?
 `from_loader()` is intended for exactly the use that you described.

> > In contrast to ``ModuleSpec.__init__()``, which takes the arguments
> > as-is, ``from_loader()`` calculates missing values from the ones passed
> > in, as much as possible.  This replaces the behavior that is currently
> > provided the several ``importlib.util`` functions as well as the
> > optional ``init_module_attrs()`` method of loaders.  Just to be clear,
> > here is a more detailed description of those calculations::
> >
> >    If not passed in, ``filename`` is to the result of calling the
> >    loader's ``get_filename()``, if available.  Otherwise it stays
> >    unset (``None``).
> >
> >    If not passed in, ``path`` is set to an empty list if
> >    ``is_package`` is true.  Then the directory from ``filename`` is
> >    appended to it, if possible.  If ``is_package`` is false, ``path``
> >    stays unset.
> >
> >    If ``cached`` is not passed in and ``filename`` is passed in,
> >    ``cached`` is derived from it.  For filenames with a source suffix,
> >    it set to the result of calling
> >    ``importlib.util.cache_from_source()``.  For bytecode suffixes (e.g.
> >    ``.pyc``), ``cached`` is set to the value of ``filename``.  If
> >    ``filename`` is not passed in or ``cache_from_source()`` raises
> >    ``NotImplementedError``, ``cached`` stays unset.
> >
> >    If not passed in, ``origin`` is set to ``filename``.  Thus if
> >    ``filename`` is unset, ``origin`` stays unset.
>
> Hmm, is there a reason this can't be the default constructor
> behaviour? What's the value of *not* having the sensible fallbacks,
> given they can always be overridden by passing in explicit values when
> you want something different?
>

I'll think about this.  There was some value in it before, but with changes
to other signatures, `from_loader()` is much less useful as a separate
factory method.

>
> A separate "from_module(m)" constructor would probably make sense, though.
>

I have this for internal use in the implementation, but did not expose it
since all modules should already have a spec.

> ``module_repr()``
> >
> > Returns a repr string for the module if ``origin`` is set and
> > ``filename`` is not set.  The string refers to the value of ``origin``.
> > Otherwise ``module_repr()`` returns None.  This indicates to the module
> > type's ``__repr__()`` that it should fall back to the default repr.
> >
> > We could also have ``module_repr()`` produce the repr for the case where
> > ``filename`` is set or where ``origin`` is not set, mirroring the repr
> > that the module type produces directly.  However, the repr string is
> > derived from the import-related module attributes, which might be out of
> > sync with the spec.
> >
> > .. XXX Is using the spec close enough?  Probably not.
>
> I think it makes sense to always return the expected repr based on the
> spec attributes, but allow a custom origin to be passed in to handle
> the case where the module __file__ attribute differs from
> __spec__.origin (keeping in mind I think __spec__.filename should be
> replaced with __spec__.set_fileattr)
>

That's the approach that I took at first, but the module that is passed in
is not guaranteed to be a spec.  Furthermore, having the spec take
precedence over the module's attrs for the repr seems like too big a
backward-compatibility risk.

>
> > The implementation of the module type's ``__repr__()`` will change to
> > accommodate this PEP.  However, the current functionality will remain to
> > handle the case where a module does not have a ``__spec__`` attribute.
>
> Experience tells us that the import system should ensure the __spec__
> attribute always exists (even if it has to be filled in from the
> module attributes after calling load_module)
>

That's a good point.  The only possible problem is for someone that creates
their own module object and expects repr to work the same as it does
currently.

> ``load(module=None, *, is_reload=False)``
>
> Yep, definitely needs to be a separate method. "is_reload" would
> almost always be set to a boolean, which means a separate API is
> likely to be better.
>

Agreed.

> However, I think the separate method should be "exec()" rather than
> "reload()" and require that the module always be passed in.
>

I'll see how that looks.  It seems like a better fit than just plain
`reload()`.

We could also expose a "create" method that just creates and returns
> the new module object, and replace importlib.util.module_to_load with
> a context manager that accepted the module as a parameter. Say
> "add_to_sys", which fails if the module is already present in
> sys.modules.
>

One of the points of ModuleSpec is to remove the need for
`module_to_load()`.  I'm not convinced of the utility of a create method
like you've described other than possibly as something internal to
ModuleSpec.

load() would then look something like:
>
>     def load(self):
>         m = self.create()
>         with importlib.util.add_to_sys(m):
>             self.exec(m)
>         return sys.modules[self.name]
>
> We could also provide reload() if we wanted to:
>
>     def reload(self):
>         self.exec(sys.modules[self.name])
>         return sys.modules[self.name]
>
> > Subclassing
> > -----------
> >
> > Subclasses of ModuleSpec are allowed, but should not be necessary.
> > Adding functionality to a custom finder or loader will likely be a
> > better fit and should be tried first.  However, as long as a subclass
> > still fulfills the requirements of the import system, objects of that
> > type are completely fine as the return value of ``find_module()``.
>
> We may need to do subclasses for the ABC registration backwards
> compatibility hack.
>

I was thinking of registering ModuleSpec in the setter of a `loader

>
> >
> > Module Objects
> > --------------
> >
> > Module objects will now have a ``__spec__`` attribute to which the
> > module's spec will be bound.  None of the other import-related module
> > attributes will be changed or deprecated, though some of them could be;
> > any such deprecation can wait until Python 4.
> >
> > ``ModuleSpec`` objects will not be kept in sync with the corresponding
> > module object's import-related attributes.  Though they may differ, in
> > practice they will typically be the same.
>
> Worth mentioning that __main__.__spec__.name will give the real name
> of module's executed with -m here rather than delaying that until the
> notes at the end.
>
> > Finders
> > -------
> >
> > Finders will now return ModuleSpec objects when ``find_module()`` is
> > called rather than loaders.  For backward compatility, ``Modulespec``
> > objects proxy the attributes of their ``loader`` attribute.
> >
> > Adding another similar method to avoid backward-compatibility issues
> > is undersireable if avoidable.  The import APIs have suffered enough,
> > especially considering ``PathEntryFinder.find_loader()`` was just
> > added in Python 3.3.  The approach taken by this PEP should be
> > sufficient to address backward-compatibility issues for
> > ``find_module()``.
> >
> > The change to ``find_module()`` applies to both ``MetaPathFinder`` and
> > ``PathEntryFinder``.  ``PathEntryFinder.find_loader()`` will be
> > deprecated and, for backward compatibility, implicitly special-cased if
> > the method exists on a finder.
>
> Actually, we don't currently have anything on ModuleSpec to indicate
> "this is complete, stop scanning for more path fragments" or how we
> will compose multiple module specs for the individual fragments into a
> combined spec for the namespace package.
>
> > Finders are still responsible for creating the loader.  That loader will
> > now be stored in the module spec returned by ``find_module()`` rather
> > than returned directly.  As is currently the case without the PEP, if a
> > loader would be costly to create, that loader can be designed to defer
> > the cost until later.
> >
> > Loaders
> > -------
> >
> > Loaders will have a new method, ``exec_module(module)``.  Its only job
> > is to "exec" the module and consequently populate the module's
> > namespace.  It is not responsible for creating or preparing the module
> > object, nor for any cleanup afterward.  It has no return value.
> >
> > The ``load_module()`` of loaders will still work and be an active part
> > of the loader API.  It is still useful for cases where the default
> > module creation/prepartion/cleanup is not appropriate for the loader.
> >
> > For example, the C API for extension modules only supports the full
> > control of ``load_module()``.  As such, ``ExtensionFileLoader`` will not
> > implement ``exec_module()``.  In the future it may be appropriate to
> > produce a second C API that would support an ``exec_module()``
> > implementation for ``ExtensionFileLoader``.  Such a change is outside
> > the scope of this PEP.
>
> As above, I think it may worth tackling this. It shouldn't be *that*
> hard given the higher level changes and will solve some hard problems
> at the lower level.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/import-sig/attachments/20130812/eee2a8d2/attachment-0001.html>