[Import-SIG] New draft revision for PEP 382

Sat Jul 9 10:42:13 CEST 2011

Thanks for working on this.  It's looking good.

A couple of questions inline.  Apologies ahead of time if my ignorance
shows to loudly. :)

On Fri, Jul 8, 2011 at 1:51 PM, P.J. Eby <pje at telecommunity.com> wrote:
>
> PEP \302 Extension
> ------------------
>
> The existing PEP 302 protocol is to be extended to handle namespace
> package portion directories, by adding a new importer method,
> ``namespace_subpath(fullname)``.  An implementation of this method
> will be added to all applicable importer classes distributed with
> Python, including those in ``pkgutil`` and ``zipimport``).
>
> (Note: any other importer wishing to support namespace packages must
> provide its own implementation of this method as well.  If an importer
> does not have a ``namespace_subpath()`` method, it will be treated as
> if it *did* have the method, but it returned ``None`` when called.)
>
> This new method is called just before the importer's ``find_module()``
> is normally invoked.  If the importer determines that `fullname` is
> a namespace package portion under its jurisdiction, then the importer
> returns an importer-specific path to that namespace portion.
>
> For example, if a standard filesystem path importer for the path
> ``/usr/lib/site-packages`` is about to be asked to import ``zope``,
> and there is a ``/usr/lib/site-packages/zope`` directory containing
> any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
> on that importer should return ``"/usr/lib/site-packages/zope"``.
>

And if there were a "zope_part1" and a "zope_part2" directory, both
with a zope.ns file in them, that namespace_subpath("zope") call would
return ["/usr/lib/site-packages/zope_part1",
"/usr/lib/site-packages/zope_part2"], right?  And if both also had a
foo.ns file in them, the same would be returned for
namespace_subpath("foo").

> However, if there is no such subdirectory, or it does *not* contain
> any files whose names end with ``.ns``, that importer would return
> ``None`` instead.
>
> The Python import machinery will call this method on each importer
> corresponding to a path entry in ``sys.path`` (for top-level imports)
> or in a parent package ``__path__`` (for subpackage imports).
>
> If a normal package or module is found before a namespace package,
> importing proceeds according to the normal PEP 302 protocol.  (That
> is, a loader object is simply asked to load the located module or
> package.)
>
> However, if a namespace package portion is found (i.e., an importer's
> ``namespace_subpath()`` returns a string), then the normal import
> search stops, and a namespace package is created instead.
>
> The import machinery continues iterating over importers and calling
> ``namespace_subpath()`` on them, but it does **not** continue calling
> ``find_module()`` on them.  Instead, it accumulates any strings
> returned by the subpath calls, in order to assemble a ``__path__``
> for the package being imported.
>
> (Note that this implies that any non-namespace packages with the same
> name are skipped, and not included in the resulting package's
> ``__path__``.  In other words, a namespace package's initial
> ``__path__`` only includes namespace portions, never non-namespace
> package directories.)
>
> Once this ``__path__`` has been assembled, a module is created, and
> its ``__path__`` attribute is set.  The package's name is then added
> to ``sys.namespace_packages`` -- a set of package names.
>
> Finally, the ``__init__`` module code for the package (if it exists)
> is located and executed in the new module's namespace.
>
> Each importer that returns a ``namespace_subpath()`` for the package
> is asked to perform a standard ``find_module()`` for the package.
> Since by the normal import rules, a directory containing an
> ``__init__`` module is a package, this call should succeed if the
> namespace package portion contains an ``__init__`` module, and the
> importing can proceed normally from that point.
>

Is this last paragraph part of the finally?  If so, what does calling
find_module at this point accompish?  Do you mean load_module is also
called for each that is found?  Will it be too easy (or conversely
very likely) to have __init__.py collisions?

> There is one caveat, however.  The importers currently distributed
> with Python expect that *they* will be the ones to initialize the
> ``__path__`` attribute, which means that they must be changed to
> either recognize that ``__path__`` has already been set and not
> change it, or to handle namespace packages specially (e.g., via an
> internal flag or checking ``sys.namespace_packages``).
>
> Similarly, any third-party importers wishing to support namespace
> packages must make similar changes.
>

Seems like the caveat is dependent on the above algorithm.  If the
module's __path__ were set with the namespace_subpath() results after
the namespace package's import was all over, would it still be an
issue?  Most of this section reads like "this is how people should
expect the implementation to look".  However, I'm fine with that if
this is how the implementation should look.  :)

> (NOTE: in general, it goes against the design of PEP 302 for a loader
> object to assume that it is always creating the module object or that
> the module it is operating on is empty.  Making this assumption can
> result in code that breaks the normal operation of the ``reload()``
> builtin and any specialized tools that rely on it, such as lazy
> importers, automatic reloaders, and so on.)
>
>
> Standard Library Changes/Additions
> ----------------------------------
>
> The ``pkgutil`` module should be updated to handle this
> specification appropriately, including any necessary changes to
> ``extend_path()``, ``iter_modules()``, etc.  A new generic API for
> calling ``namespace_subpath()`` on importers should be added as well.
>
> Specifically the proposed changes and additions are:

Maybe, "Specifically the proposed changes and additions to pkgutil
are:", to clarify the context?

>
> * A new ``namespace_subpath(importer, fullname)`` generic, allowing
>  implementations to be registered for existing importers.
>
> * A new ``extend_namespaces(path_entry)`` function, to extend existing
>  and already-imported namespace packages' ``__path__`` attributes to
>  include any portions found in a new ``sys.path`` entry.  This
>  function should be called by applications extending ``sys.path``
>  at runtime, e.g. to include a plugin directory or add an egg to the
>  path.
>
>  The implementation of this function does a simple breadth-first walk
>  of ``sys.namespace_packages``, and performs any necessary
>  ``namespace_subpath()`` calls to identify what path entries need to
>  be added to each package's ``__path__``, given that `path_entry`
>  has been added to ``sys.path``.
>

Does the same apply to namespace sub-packages where their parent
package has an updated __path__?  So a recursion would take place in
some cases.

> * A new ``iter_namespaces(parent='')`` function to allow breadth-first
>  traversal of namespaces in ``sys.namespace_packages``, by yielding
>  the child namespace packages of `parent`.  For example, calling
>  ``iter_namespaces("zope")`` might yield ``zope.app`` and
>  ``zope.products`` (if they are namespace packages registered in
>  ``sys.namespace_packagess``), but **not** ``zope.foo.bar``.
>  This function is needed to implement ``extend_namespaces()``, but
>  is potentially useful to others.
>
> * ``ImpImporter.iter_modules()`` should be changed to also detect and
>  yield the names of namespace package portions.
>
> In addition to the above changes, the ``zipimport`` importer should
> have its ``iter_modules()`` implementation similarly changed.  (Note:
> current versions of Python implement this via a shim in ``pkgutil``,
> so technically this is also a change to ``pkgutil``.)
>
>
> Implementation Notes
> --------------------
>
> For users, developers, and distributors of namespace packages:
>
> * ``sys.namespace_packages`` is allowed to contain non-existent or
>  not-yet-imported package names; code that uses its contents should
>  not assume that every name in this set is also present in
>  sys.packages or that importing the name will necessarily succeed.
>
> * ``*.ns`` files must be empty or contain only ASCII whitespace
>  characters.  This leaves open the possibility for future extension
>  to the format.
>
> * Files contained within a namespace package portion directory must
>  be *unique* to that portion, so that the portion can be distributed
>  as a vendor package without any filename overlap.  This applies to
>  modules and data files as well as ``*.ns`` files.
>
>  (For ``*.ns`` files themselves, uniqueness can be achieved simply by
>  giving them a name based on the distribution that contains the file,
>  and it is recommended that packaging tools support doing this
>  automatically.)
>
> * Although this PEP supports the use of non-empty ``__init__`` modules
>  in namespace packages, their usage is controversial.  If more than
>  one package portion contains an ``__init__`` module, at most one of
>  them will be executed, possibly leading to silent errors.
>

As noted above, the implementation outlined in the "PEP \302
Extension" section seems ambiguous on this point.  However, I think
this bullet does a great job clarifying about __init__.py modules.

>  Therefore, if you must include an ``__init__`` module in your
>  namespace package, make sure that it is provided by exactly **one**
>  distribution, and that all other distributions using that module's
>  contents are defined so as to have an installation dependency on
>  the distribution containing the ``__init__`` module.  Otherwise,
>  it may not be present in some installations.
>
>  (Note: for historical reasons, existing namespace packages nearly
>  always include ``__init__`` modules, but they are usually empty
>  except for code to declare the package a namespace.  Under this
>  proposal, these nearly-empty modules could and should be replaced
>  by an empty ``*.ns`` file in the package directory.)
>
> For those implementing PEP 302 importer objects:
>
> * Importers that support the ``iter_modules()`` method and want to add
>  namespace support should modify their ``iter_modules()``
>  method so that it discovers and list namespace packages as well as
>  standard modules and packages.
>

The iter_modules() method isn't part of PEP 302, is it?  Where can I
find out more about it?

> * For implementation efficiency, an importer is allowed to cache
>  information (such as whether a directory exists and whether an
>  ``__init__`` module is present in it) between the invocation of a
>  ``namespace_subpath()`` call and a subsequent ``find_module()`` call
>  for the same name.
>
>  It should, however, avoid retaining such cached information for any
>  longer than the next method call, and it should also verify that the
>  request is in fact for the same module/package name, as it is not
>  guaranteed that a ``namespace_subpath()`` call will always be
>  followed by a matching ``find_module()`` call.  (After all, an
>  ``__init__`` module may already have been supplied by an earlier
>  importer on the path.)
>
> * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
>  not need to implement ``namespace_subpath()``, because the method
>  is only called on importers corresponding to ``sys.path`` entries.'

And parent.__path__ for namespace submodules?

>  If a meta importer wishes to support namespace packages, it must
>  do so entirely within its ``find_module()`` implementation.
>
>  Unfortunately, it is unlikely that any such implementation will be
>  able to merge its namespace portions with those of other meta
>  importers or ``sys.path`` importers, so the meaning of "supporting
>  namespace packages" for a meta importer is currently undefined.
>

While I'm not sure meta importers need to be left out, I suppose it
isn't critical since the work-around isn't that hard, nor widely
needed.  Thus the message here is that this PEP only applies to the
use of sys.path_hooks and sys.path_importer_cache.  It would be nice
for that to be clear up front.

>...
>
>  Further, there is an immense body of existing code (including the
>  distutils and many other packaging tools) that expect a package
>  directory's name to be the same as the package name.

Correct me if I'm wrong, but I have understood that for namespace
packages in the PEP, the directory name does not have to be the
package name.

Back to namespace subpackages, it's unclear how they should work.
Either a namespace package is at the top level of a sys.path entry, or
its a module of a parent package, namespace or otherwise.  The
top-level case is pretty clear.  However, the subpackage case is not.
I don't see namespace subpackages as being too practical with
non-namespace parent packages, but I'm probably missing something.

In the case that a namespace subpackage has a namespace package
parent, how would that look?  In the email to Barry you gave an
example that covered this a little, but it's still pretty unclear.  In
any case, I think more examples with namespace subpackages would be
helpful.  Or maybe namespace subpackages are a corner case that
doesn't deserve the keystrokes I've given it.  ;)

Other than that, the PEP is pretty clear (coming from a less
experienced perspective).  Thanks again for working on this!

-eric