[Import-SIG] New draft revision for PEP 382

Sat Jul 9 00:31:35 CEST 2011

On Jul 08, 2011, at 03:51 PM, P.J. Eby wrote:

>The following is my attempt at an updated draft of PEP 382, based on the
>recently-discussed changes.

Thanks!  I've been trying to catch up on the mailing list traffic today, and
grabbed your prototype code.  I plan on committing it to MvL's pep382 hg
branch so we have a place to play with it.

Comments inlined.

>PEP: 382
>Title: Namespace Package Declarations
>Version: $Revision$
>Last-Modified: $Date$
>Author: Martin v. LÃ¶wis <martin at v.loewis.de>, PJ Eby
>Status: Draft
>Type: Standards Track
>Content-Type: text/x-rst
>Created: 02-Apr-2009
>Python-Version: 3.2
>Post-History:
>
>Abstract
>========
>
>This PEP proposes an enhancement to Python's import machinery to
>replace existing uses of the standard library's
>``pkgutil.extend_path()`` API, and similar third-party APIs such as
>``pkg_resources.declare_namespace()``.
>
>The proposed enhancement will improve the reliability of existing
>namespace package implementations, while providing "One Obvious Way"
>to produce and consume namespace packages.
>
>
>Terminology
>===========
>
>Within this PEP, the following terms are used as follows:
>
>Package
>     Python packages as defined by Python's import statement.
>
>Distribution
>     A separately installable set of Python modules, as registered in
>     the Python package index, and installed by distutils, setuptools,
>     etc.
>
>Vendor Package
>     A group of files installed by an operating system's packaging
>     mechanism (e.g. Debian or Redhat packages installed on Linux
>     systems).
>
>Portion
>     A set of files in a single directory (possibly inside a zip file
>     or other storage mechanism) that contribute modules or subpackages
>     to a namespace package.  The contents of each portion ``sys.path``

This one got cut off.

>Namespace Package
>     A package whose subpackages and modules can be split into portions
>     that can be distributed or installed separately (via separate
>     distributions and/or vendor packages), in shared or separate
>     installation locations.
>
>     Unlike a regular package, however, which only allows submodule
>     and subpackage imports from a single location, a namespace
>     package's ``__path__`` is configured so that submodules and
>     subpackages can be imported from each of its installed portions,
>     regardless of their relative positions in ``sys.path``.
>
>
>Motivation
>==========
>
>.. epigraph::
>
>     "Most packages are like modules.  Their contents are highly
>     interdependent and can't be pulled apart.  [However,] some
>     packages exist to provide a separate namespace. ...  It should
>     be possible to distribute sub-packages or submodules of these
>     [namespace packages] independently."
>
>     -- Jim Fulton, shortly before the release of Python 2.3 [1]_

Nice find!

>The Current Approach
>--------------------
>
>First introduced in Python 2.3, namespace packages are a mechanism
>for splitting a single Python package across multiple directories
>on disk.  This splitting has two main benefits:
>
>1. It allows different parts of a large package or framework to be
>    distributed and installed independently.  For example, installing
>    the ``zope.interface`` package without having to install every
>    package in the ``zope.*`` namespace.
>
>    (This is somewhat similar to the way Perl's package system allows
>    authors to separately distribute subpackages of ``File::`` or
>    ``Email::``.)
>
>2. As a side-effect of benefit 1, it reduces package naming collisions
>    across multiple authors or organizations, by encouraging them to
>    use distinguishing prefixes.  Instead of say, Zope and Twisted both
>    offering a top-level ``interface`` package (in which case, both
>    could not be installed to the same directory), they can use
>    ``zope.interface`` and ``twisted.interface``, while still being
>    able to distribute these subpackages separately from other ``zope``
>    or ``twisted`` subpackages.
>
>    (This is somewhat similar to the way Java uses names like
>    ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions,
>    only flatter.)
>
>In current Python versions, however, a registration function (such as
>``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
>must be explicitly invoked in order to set up the package's
>``__path__``.

Do you need to explain a little more why __path__ is significant, and why the
registration function is required?

>There are two problems with this approach, however.
>
>
>Problems With The Current Approach
>----------------------------------
>
>The first (and lesser) problem is that there is no One Obvious Way to
>either declare that a package is a "namespace" or "module" package,
>or to tell which kind of package a given directory on disk is.
>
>Instead, you must choose one of the various APIs to use, each of
>which is slightly-incompatible with the others.  (For example,
>``pkgutil`` supports ``*.pkg`` files; setuptools doesn't.  Likewise,
>setuptools supports package portions living in zip files, and adding
>new path components to already-imported namespaces, whereas
>``pkgutil`` doesn't.)
>
>Similarly, to tell whether a given directory is a "namespace" or
>"module" package, you must read its documentation or inspect its code
>in detail, and be able to recognize the various API calls mentioned
>above.
>
>The second -- and much larger -- issue is that whichever API is used
>to declare the namespace, the declaration has to be invoked from a
>namespace package's ``__init__`` module in order to work.  (Otherwise,
>only the first part of the package found on ``sys.path`` would be
>importable.)
>
>This clashes with the goal of separately installing portions of a
>namespace, because then each distributed piece must include a copy
>of the same ``__init__.py``.  (Otherwise, each piece would not be
>importable on its own, as Python currently requires the existence
>of an ``__init__`` module in order to import the package at all, let
>alone set up the namespace!)
>
>In addition to the developer inconvenience of creating, synchronizing,
>and distributing these duplicated ``__init__`` modules, there is a
>further problem created for operating system vendors.
>
>Vendor packages typically must not provide overlapping files, and an
>attempt to install a vendor package that has a file already on disk
>will fail or cause unpredictable behavior.  As vendors might choose to
>package distributions such that they will end up all in a single
>directory for the namespace package, all portions would contribute
>conflicting ``__init__.py`` files.

I might word this a little differently.  Perhaps:

Vendor packaging standards require every file on disk to be owned by exactly
one vendor package.  But because each portion of a namespace package may be
contained in a separate vendor package, multiple vendor packages would have to
own the namespace package's __init__.py file.  For example, would the
``zope.interface`` vendor package own ``zope/__init__.py`` or would the
``zope.component`` vendor package own it?  Different vendors handle this
conflict differently, and in fact, different packaging tools from the same
vendor can handle this differently, which can cause consistency problems.

>This issue has lead to various fragile and complex workarounds in
>practice, such as ``.pth`` file abuse by setuptools, and the shipping
>of broken partial packages with distutils.
>
>With the enhancement proposed here, however, all of the above problems
>can be readily resolved.
>
>
>Specification
>=============
>
>Instead of an API call buried inside a series of duplicated and
>potentially-clashing ``__init__`` modules (which mostly exist only
>to make the package importable and declare its namespace-ness), this
>PEP proposes that Python's import machinery be modified to include
>direct support for namespace packages.
>
>This support would work by adding a new way to desginate a directory

s/desginate/designate/

>as containing a namespace package portion: by including one or more
>``*.ns`` files in it.
>
>This approach removes the need for an ``__init__`` module to be
>duplicated across namespace package portions.  Instead, each portion
>can simply include a uniquely-named ``*.ns`` file, thereby avoiding
>filename clashes in vendor packages.

I think a concrete example would really help here.  E.g.:

For example, the ``zope.interface`` portion would include a
``zope/zope.interface.ns`` file, while the ``zope.component`` portion would
include a ``zope/zope.component.ns`` file.  The very presence of any ``.ns``
files inside the ``zope`` directory is enough to designate ``zope`` as a
namespace package.  No conflicting ``zope/__init__.py`` file is necessary.

>And, since the import machinery knows that these directories are
>portions of a namespace package, it can automatically initialize
>the package's ``__path__`` to include portions located on different
>parts of ``sys.path``.  (Thus avoiding the need for special code
>to be called in the ``__init__`` module.)
>
>In addition to doing this path setup, the import machinery will also
>add any imported namespace packages to ``sys.namespace_packages``
>(initially an empty set), so that namespace packages can be identified
>or iterated over.
>
>
>PEP \302 Extension
>------------------
>
>The existing PEP 302 protocol is to be extended to handle namespace
>package portion directories, by adding a new importer method,
>``namespace_subpath(fullname)``.  An implementation of this method
>will be added to all applicable importer classes distributed with
>Python, including those in ``pkgutil`` and ``zipimport``).
>
>(Note: any other importer wishing to support namespace packages must
>provide its own implementation of this method as well.  If an importer
>does not have a ``namespace_subpath()`` method, it will be treated as
>if it *did* have the method, but it returned ``None`` when called.)
>
>This new method is called just before the importer's ``find_module()``
>is normally invoked.  If the importer determines that `fullname` is
>a namespace package portion under its jurisdiction, then the importer
>returns an importer-specific path to that namespace portion.

Please define exactly what ``fullname`` is.

>For example, if a standard filesystem path importer for the path
>``/usr/lib/site-packages`` is about to be asked to import ``zope``,
>and there is a ``/usr/lib/site-packages/zope`` directory containing
>any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
>on that importer should return ``"/usr/lib/site-packages/zope"``.
>
>However, if there is no such subdirectory, or it does *not* contain
>any files whose names end with ``.ns``, that importer would return
>``None`` instead.
>
>The Python import machinery will call this method on each importer
>corresponding to a path entry in ``sys.path`` (for top-level imports)
>or in a parent package ``__path__`` (for subpackage imports).
>
>If a normal package or module is found before a namespace package,
>importing proceeds according to the normal PEP 302 protocol.  (That
>is, a loader object is simply asked to load the located module or
>package.)
>
>However, if a namespace package portion is found (i.e., an importer's
>``namespace_subpath()`` returns a string), then the normal import
>search stops, and a namespace package is created instead.
>
>The import machinery continues iterating over importers and calling
>``namespace_subpath()`` on them, but it does **not** continue calling
>``find_module()`` on them.  Instead, it accumulates any strings
>returned by the subpath calls, in order to assemble a ``__path__``
>for the package being imported.
>
>(Note that this implies that any non-namespace packages with the same
>name are skipped, and not included in the resulting package's
>``__path__``.  In other words, a namespace package's initial
>``__path__`` only includes namespace portions, never non-namespace
>package directories.)

Would you expect this to be common?  Did you have any examples in mind, or was
it just covering-the-bases?

>Once this ``__path__`` has been assembled, a module is created, and
>its ``__path__`` attribute is set.  The package's name is then added
>to ``sys.namespace_packages`` -- a set of package names.
>
>Finally, the ``__init__`` module code for the package (if it exists)
>is located and executed in the new module's namespace.
>
>Each importer that returns a ``namespace_subpath()`` for the package
>is asked to perform a standard ``find_module()`` for the package.
>Since by the normal import rules, a directory containing an
>``__init__`` module is a package, this call should succeed if the
>namespace package portion contains an ``__init__`` module, and the
>importing can proceed normally from that point.
>
>There is one caveat, however.  The importers currently distributed
>with Python expect that *they* will be the ones to initialize the
>``__path__`` attribute, which means that they must be changed to
>either recognize that ``__path__`` has already been set and not
>change it, or to handle namespace packages specially (e.g., via an
>internal flag or checking ``sys.namespace_packages``).
>
>Similarly, any third-party importers wishing to support namespace
>packages must make similar changes.
>
>(NOTE: in general, it goes against the design of PEP 302 for a loader
>object to assume that it is always creating the module object or that
>the module it is operating on is empty.  Making this assumption can
>result in code that breaks the normal operation of the ``reload()``
>builtin and any specialized tools that rely on it, such as lazy
>importers, automatic reloaders, and so on.)
>
>
>Standard Library Changes/Additions
>----------------------------------
>
>The ``pkgutil`` module should be updated to handle this
>specification appropriately, including any necessary changes to
>``extend_path()``, ``iter_modules()``, etc.  A new generic API for
>calling ``namespace_subpath()`` on importers should be added as well.

Is there any reason not to put extend_path() on the road to deprecation?

>Specifically the proposed changes and additions are:
>
>* A new ``namespace_subpath(importer, fullname)`` generic, allowing
>   implementations to be registered for existing importers.

Is this the registration mechanism?

>* A new ``extend_namespaces(path_entry)`` function, to extend existing
>   and already-imported namespace packages' ``__path__`` attributes to
>   include any portions found in a new ``sys.path`` entry.  This
>   function should be called by applications extending ``sys.path``
>   at runtime, e.g. to include a plugin directory or add an egg to the
>   path.
>
>   The implementation of this function does a simple breadth-first walk
>   of ``sys.namespace_packages``, and performs any necessary
>   ``namespace_subpath()`` calls to identify what path entries need to
>   be added to each package's ``__path__``, given that `path_entry`
>   has been added to ``sys.path``.
>
>* A new ``iter_namespaces(parent='')`` function to allow breadth-first
>   traversal of namespaces in ``sys.namespace_packages``, by yielding
>   the child namespace packages of `parent`.  For example, calling
>   ``iter_namespaces("zope")`` might yield ``zope.app`` and
>   ``zope.products`` (if they are namespace packages registered in
>   ``sys.namespace_packagess``), but **not** ``zope.foo.bar``.

s/packagess/packages/

>   This function is needed to implement ``extend_namespaces()``, but
>   is potentially useful to others.
>
>* ``ImpImporter.iter_modules()`` should be changed to also detect and
>   yield the names of namespace package portions.
>
>In addition to the above changes, the ``zipimport`` importer should
>have its ``iter_modules()`` implementation similarly changed.  (Note:
>current versions of Python implement this via a shim in ``pkgutil``,
>so technically this is also a change to ``pkgutil``.)
>
>
>Implementation Notes
>--------------------
>
>For users, developers, and distributors of namespace packages:
>
>* ``sys.namespace_packages`` is allowed to contain non-existent or
>   not-yet-imported package names; code that uses its contents should
>   not assume that every name in this set is also present in
>   sys.packages or that importing the name will necessarily succeed.
>
>* ``*.ns`` files must be empty or contain only ASCII whitespace
>   characters.  This leaves open the possibility for future extension
>   to the format.

Getting back to our previous discussion on this, I might also add a comment
format, e.g. lines starting with `#`.  Almost any extension we can come up
with will probably need to include comments, so we might as well add them here
now.  This will also allow folks to add copyright, or other textual
information into .ns files as their coding conventions may dictate.

Do you expect to ignore everything else, or throw an exception?  Let's be
explicit about that.

>* Files contained within a namespace package portion directory must
>   be *unique* to that portion, so that the portion can be distributed
>   as a vendor package without any filename overlap.  This applies to
>   modules and data files as well as ``*.ns`` files.
>
>   (For ``*.ns`` files themselves, uniqueness can be achieved simply by
>   giving them a name based on the distribution that contains the file,
>   and it is recommended that packaging tools support doing this
>   automatically.)
>
>* Although this PEP supports the use of non-empty ``__init__`` modules
>   in namespace packages, their usage is controversial.  If more than
>   one package portion contains an ``__init__`` module, at most one of
>   them will be executed, possibly leading to silent errors.
>
>   Therefore, if you must include an ``__init__`` module in your
>   namespace package, make sure that it is provided by exactly **one**
>   distribution, and that all other distributions using that module's
>   contents are defined so as to have an installation dependency on
>   the distribution containing the ``__init__`` module.  Otherwise,
>   it may not be present in some installations.
>
>   (Note: for historical reasons, existing namespace packages nearly
>   always include ``__init__`` modules, but they are usually empty
>   except for code to declare the package a namespace.  Under this
>   proposal, these nearly-empty modules could and should be replaced
>   by an empty ``*.ns`` file in the package directory.)

I'd be a little more forceful; the PEP should strongly recommend against
including namespace package __init__.py files.

>For those implementing PEP 302 importer objects:
>
>* Importers that support the ``iter_modules()`` method and want to add
>   namespace support should modify their ``iter_modules()``
>   method so that it discovers and list namespace packages as well as
>   standard modules and packages.
>
>* For implementation efficiency, an importer is allowed to cache
>   information (such as whether a directory exists and whether an
>   ``__init__`` module is present in it) between the invocation of a
>   ``namespace_subpath()`` call and a subsequent ``find_module()`` call
>   for the same name.
>
>   It should, however, avoid retaining such cached information for any
>   longer than the next method call, and it should also verify that the
>   request is in fact for the same module/package name, as it is not
>   guaranteed that a ``namespace_subpath()`` call will always be
>   followed by a matching ``find_module()`` call.  (After all, an
>   ``__init__`` module may already have been supplied by an earlier
>   importer on the path.)
>
>* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
>   not need to implement ``namespace_subpath()``, because the method
>   is only called on importers corresponding to ``sys.path`` entries.'
>   If a meta importer wishes to support namespace packages, it must
>   do so entirely within its ``find_module()`` implementation.
>
>   Unfortunately, it is unlikely that any such implementation will be
>   able to merge its namespace portions with those of other meta
>   importers or ``sys.path`` importers, so the meaning of "supporting
>   namespace packages" for a meta importer is currently undefined.
>
>   However, since the intended use case for meta importers is to
>   replace Python's normal import process entirely for some subset of
>   modules, and the number of such importers currently implemented is
>   quite small, this seems unlikely to be a big issue in practice.
>
>
>Rejected Alternatives
>=====================
>
>* The original version of this PEP used ``.pkg`` or ``.pth`` files
>   that contained either explicit directories to be added to a
>   package's ``__path__``, or ``*`` to indicate that a package was
>   a namespace.
>
>   But this approach required a more complex change to the importer
>   protocol, the files had to actually be opened and read, and there
>   were no concrete use cases proposed for the additional flexibility
>   specifying explicit paths.
>
>* On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using
>   extra files, namespace packages use a ``__pkg__.py`` file to
>   indicate their namespace-ness, in addition to a (required)
>   ``__init__.py``.
>
>   Unfortunately, this approach solves only one of the `problems with
>   the current approach`_: i.e., having a standard way of declaring and
>   identifying namespace packages.  It does not address the necessity
>   of distributing duplicated files, or filename overlap between
>   distributions.  Further, it does not allow truly-independent
>   namespace portions to exist, since it requires a "defining" portion
>   (the portion containing the single ``__init__`` module) to exist.
>
>* Another approach considered during revisions to this PEP was to
>   simply rename package directories to add a suffix like ``.ns``
>   or ``-ns``, to indicate their namespaced nature.  This would effect
>   a small performance improvement for the initial import of a
>   namespace package, avoid the need to create empty ``*.ns`` files,
>   and even make it clearer that the directory involved is a namespace
>   portion.
>
>   The downsides, however, are also plentiful.  If a package starts
>   its life as a normal package, it must be renamed when it becomes
>   a namespace, with the implied consequences for revision control
>   tools.
>
>   Further, there is an immense body of existing code (including the
>   distutils and many other packaging tools) that expect a package
>   directory's name to be the same as the package name.  And porting
>   existing Python 2.x namespace packages to Python 3 would require
>   widespread directory renaming as well.
>
>   In short, this approach would require a vastly larger number of
>   changes to both the standard library and third-party code, for
>   a tiny potential performance improvement and a small increase in
>   clarity.  It was therefore rejected on "practicality vs. purity"
>   grounds.
>
>
>
>References
>==========
>
>.. [1] "namespace" vs "module" packages (mailing list thread)
>    (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
>
>.. [2] "PEP \382: Namespace Packages" (mailing list thread)
>    (http://mail.python.org/pipermail/python-dev/2009-April/088087.html)
>
>Copyright
>=========
>
>This document has been placed in the public domain.
>
>
>..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:

You've done a really excellent job at both simplifying the specification, and
providing a clear explanation of the issues and mechanisms involved.  Kudos!
I really like this a lot, and wholeheartedly support its adoption.  I hope MvL
will agree.

I'm going to have a look at your prototype now and will commit it, and any
updates, to the hg repo.

Cheers,
-Barry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/import-sig/attachments/20110708/2dc4862f/attachment.pgp>