[Import-SIG] New draft revision for PEP 382

Fri Jul 8 23:52:49 CEST 2011

On Fri, Jul 8, 2011 at 1:51 PM, P.J. Eby <pje at telecommunity.com> wrote:
> The following is my attempt at an updated draft of PEP 382, based on the
> recently-discussed changes.
>
> To address the questions and criticisms raisd on Python-Dev when the PEP was
> introduced, I added an extended "Motivation" section that explains issues
> with the current approaches, and states the case for the PEP in more detail,
> including info about why anyone should care about namespace packages in the
> first place.  ;-)
>
> I've also added a "Rejected Alternatives" section to document the other
> proposed approaches and the rationale for rejecting them in favor of the
> current proposal.
>
> In addition, I've specified in a bit more detail the necessary changes to
> e.g. the pkgutil module.  (At least one open issue remains, however, and
> that is the question of what, if anything, should happen to the existing
> extend_path() function.  A second possible open question regards the API of
> the path fixup functions I propose in pkgutil.)
>
> Anyway, your questions and comments, please!  The draft follows below:
>
>
> PEP: 382
> Title: Namespace Package Declarations
> Version: $Revision$
> Last-Modified: $Date$
> Author: Martin v. LÃ¶wis <martin at v.loewis.de>, PJ Eby
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 02-Apr-2009
> Python-Version: 3.2
> Post-History:
>
> Abstract
> ========
>
> This PEP proposes an enhancement to Python's import machinery to
> replace existing uses of the standard library's
> ``pkgutil.extend_path()`` API, and similar third-party APIs such as
> ``pkg_resources.declare_namespace()``.
>
> The proposed enhancement will improve the reliability of existing
> namespace package implementations, while providing "One Obvious Way"
> to produce and consume namespace packages.
>
>
> Terminology
> ===========
>
> Within this PEP, the following terms are used as follows:
>
> Package
>    Python packages as defined by Python's import statement.
>
> Distribution
>    A separately installable set of Python modules, as registered in
>    the Python package index, and installed by distutils, setuptools,
>    etc.
>
> Vendor Package
>    A group of files installed by an operating system's packaging
>    mechanism (e.g. Debian or Redhat packages installed on Linux
>    systems).
>
> Portion
>    A set of files in a single directory (possibly inside a zip file
>    or other storage mechanism) that contribute modules or subpackages
>    to a namespace package.  The contents of each portion ``sys.path``
>
> Namespace Package
>    A package whose subpackages and modules can be split into portions
>    that can be distributed or installed separately (via separate
>    distributions and/or vendor packages), in shared or separate
>    installation locations.
>
>    Unlike a regular package, however, which only allows submodule
>    and subpackage imports from a single location, a namespace
>    package's ``__path__`` is configured so that submodules and
>    subpackages can be imported from each of its installed portions,
>    regardless of their relative positions in ``sys.path``.
>
>
> Motivation
> ==========
>
> .. epigraph::
>
>    "Most packages are like modules.  Their contents are highly
>    interdependent and can't be pulled apart.  [However,] some
>    packages exist to provide a separate namespace. ...  It should
>    be possible to distribute sub-packages or submodules of these
>    [namespace packages] independently."
>
>    -- Jim Fulton, shortly before the release of Python 2.3 [1]_
>

This is a really helpful addition.

>
> The Current Approach
> --------------------
>
> First introduced in Python 2.3, namespace packages are a mechanism
> for splitting a single Python package across multiple directories
> on disk.  This splitting has two main benefits:
>
> 1. It allows different parts of a large package or framework to be
>   distributed and installed independently.  For example, installing
>   the ``zope.interface`` package without having to install every
>   package in the ``zope.*`` namespace.
>
>   (This is somewhat similar to the way Perl's package system allows
>   authors to separately distribute subpackages of ``File::`` or
>   ``Email::``.)
>
> 2. As a side-effect of benefit 1, it reduces package naming collisions
>   across multiple authors or organizations, by encouraging them to
>   use distinguishing prefixes.  Instead of say, Zope and Twisted both
>   offering a top-level ``interface`` package (in which case, both
>   could not be installed to the same directory), they can use
>   ``zope.interface`` and ``twisted.interface``, while still being
>   able to distribute these subpackages separately from other ``zope``
>   or ``twisted`` subpackages.
>
>   (This is somewhat similar to the way Java uses names like
>   ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions,
>   only flatter.)
>
> In current Python versions, however, a registration function (such as
> ``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
> must be explicitly invoked in order to set up the package's
> ``__path__``.
>
> There are two problems with this approach, however.
>
>
> Problems With The Current Approach
> ----------------------------------
>
> The first (and lesser) problem is that there is no One Obvious Way to
> either declare that a package is a "namespace" or "module" package,
> or to tell which kind of package a given directory on disk is.
>
> Instead, you must choose one of the various APIs to use, each of
> which is slightly-incompatible with the others.  (For example,
> ``pkgutil`` supports ``*.pkg`` files; setuptools doesn't.  Likewise,
> setuptools supports package portions living in zip files, and adding
> new path components to already-imported namespaces, whereas
> ``pkgutil`` doesn't.)
>
> Similarly, to tell whether a given directory is a "namespace" or
> "module" package, you must read its documentation or inspect its code
> in detail, and be able to recognize the various API calls mentioned
> above.
>
> The second -- and much larger -- issue is that whichever API is used
> to declare the namespace, the declaration has to be invoked from a
> namespace package's ``__init__`` module in order to work.  (Otherwise,
> only the first part of the package found on ``sys.path`` would be
> importable.)
>
> This clashes with the goal of separately installing portions of a
> namespace, because then each distributed piece must include a copy
> of the same ``__init__.py``.  (Otherwise, each piece would not be
> importable on its own, as Python currently requires the existence
> of an ``__init__`` module in order to import the package at all, let
> alone set up the namespace!)
>
> In addition to the developer inconvenience of creating, synchronizing,
> and distributing these duplicated ``__init__`` modules, there is a
> further problem created for operating system vendors.
>
> Vendor packages typically must not provide overlapping files, and an
> attempt to install a vendor package that has a file already on disk
> will fail or cause unpredictable behavior.  As vendors might choose to
> package distributions such that they will end up all in a single
> directory for the namespace package, all portions would contribute
> conflicting ``__init__.py`` files.
>
> This issue has lead to various fragile and complex workarounds in
> practice, such as ``.pth`` file abuse by setuptools, and the shipping
> of broken partial packages with distutils.
>
> With the enhancement proposed here, however, all of the above problems
> can be readily resolved.
>
>
> Specification
> =============
>
> Instead of an API call buried inside a series of duplicated and
> potentially-clashing ``__init__`` modules (which mostly exist only
> to make the package importable and declare its namespace-ness), this
> PEP proposes that Python's import machinery be modified to include
> direct support for namespace packages.
>
> This support would work by adding a new way to desginate a directory
> as containing a namespace package portion: by including one or more
> ``*.ns`` files in it.
>
> This approach removes the need for an ``__init__`` module to be
> duplicated across namespace package portions.  Instead, each portion
> can simply include a uniquely-named ``*.ns`` file, thereby avoiding
> filename clashes in vendor packages.
>
> And, since the import machinery knows that these directories are
> portions of a namespace package, it can automatically initialize
> the package's ``__path__`` to include portions located on different
> parts of ``sys.path``.  (Thus avoiding the need for special code
> to be called in the ``__init__`` module.)
>
> In addition to doing this path setup, the import machinery will also
> add any imported namespace packages to ``sys.namespace_packages``
> (initially an empty set), so that namespace packages can be identified
> or iterated over.
>
>
> PEP \302 Extension
> ------------------
>
> The existing PEP 302 protocol is to be extended to handle namespace
> package portion directories, by adding a new importer method,
> ``namespace_subpath(fullname)``.  An implementation of this method
> will be added to all applicable importer classes distributed with
> Python, including those in ``pkgutil`` and ``zipimport``).
>
> (Note: any other importer wishing to support namespace packages must
> provide its own implementation of this method as well.  If an importer
> does not have a ``namespace_subpath()`` method, it will be treated as
> if it *did* have the method, but it returned ``None`` when called.)
>
> This new method is called just before the importer's ``find_module()``
> is normally invoked.  If the importer determines that `fullname` is
> a namespace package portion under its jurisdiction, then the importer
> returns an importer-specific path to that namespace portion.
>
> For example, if a standard filesystem path importer for the path
> ``/usr/lib/site-packages`` is about to be asked to import ``zope``,
> and there is a ``/usr/lib/site-packages/zope`` directory containing
> any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
> on that importer should return ``"/usr/lib/site-packages/zope"``.
>
> However, if there is no such subdirectory, or it does *not* contain
> any files whose names end with ``.ns``, that importer would return
> ``None`` instead.
>
> The Python import machinery will call this method on each importer
> corresponding to a path entry in ``sys.path`` (for top-level imports)
> or in a parent package ``__path__`` (for subpackage imports).
>
> If a normal package or module is found before a namespace package,
> importing proceeds according to the normal PEP 302 protocol.  (That
> is, a loader object is simply asked to load the located module or
> package.)
>
> However, if a namespace package portion is found (i.e., an importer's
> ``namespace_subpath()`` returns a string), then the normal import
> search stops, and a namespace package is created instead.
>
> The import machinery continues iterating over importers and calling
> ``namespace_subpath()`` on them, but it does **not** continue calling
> ``find_module()`` on them.  Instead, it accumulates any strings
> returned by the subpath calls, in order to assemble a ``__path__``
> for the package being imported.
>
> (Note that this implies that any non-namespace packages with the same
> name are skipped, and not included in the resulting package's
> ``__path__``.  In other words, a namespace package's initial
> ``__path__`` only includes namespace portions, never non-namespace
> package directories.)
>
> Once this ``__path__`` has been assembled, a module is created, and
> its ``__path__`` attribute is set.  The package's name is then added
> to ``sys.namespace_packages`` -- a set of package names.
>
> Finally, the ``__init__`` module code for the package (if it exists)
> is located and executed in the new module's namespace.
>
> Each importer that returns a ``namespace_subpath()`` for the package
> is asked to perform a standard ``find_module()`` for the package.
> Since by the normal import rules, a directory containing an
> ``__init__`` module is a package, this call should succeed if the
> namespace package portion contains an ``__init__`` module, and the
> importing can proceed normally from that point.
>
> There is one caveat, however.  The importers currently distributed
> with Python expect that *they* will be the ones to initialize the
> ``__path__`` attribute, which means that they must be changed to
> either recognize that ``__path__`` has already been set and not
> change it, or to handle namespace packages specially (e.g., via an
> internal flag or checking ``sys.namespace_packages``).
>
> Similarly, any third-party importers wishing to support namespace
> packages must make similar changes.
>
> (NOTE: in general, it goes against the design of PEP 302 for a loader
> object to assume that it is always creating the module object or that
> the module it is operating on is empty.  Making this assumption can
> result in code that breaks the normal operation of the ``reload()``
> builtin and any specialized tools that rely on it, such as lazy
> importers, automatic reloaders, and so on.)
>
>
> Standard Library Changes/Additions
> ----------------------------------
>
> The ``pkgutil`` module should be updated to handle this
> specification appropriately, including any necessary changes to
> ``extend_path()``, ``iter_modules()``, etc.  A new generic API for
> calling ``namespace_subpath()`` on importers should be added as well.
>
> Specifically the proposed changes and additions are:
>
> * A new ``namespace_subpath(importer, fullname)`` generic, allowing
>  implementations to be registered for existing importers.
>
> * A new ``extend_namespaces(path_entry)`` function, to extend existing
>  and already-imported namespace packages' ``__path__`` attributes to
>  include any portions found in a new ``sys.path`` entry.  This
>  function should be called by applications extending ``sys.path``
>  at runtime, e.g. to include a plugin directory or add an egg to the
>  path.
>
>  The implementation of this function does a simple breadth-first walk
>  of ``sys.namespace_packages``, and performs any necessary
>  ``namespace_subpath()`` calls to identify what path entries need to
>  be added to each package's ``__path__``, given that `path_entry`
>  has been added to ``sys.path``.
>
> * A new ``iter_namespaces(parent='')`` function to allow breadth-first
>  traversal of namespaces in ``sys.namespace_packages``, by yielding
>  the child namespace packages of `parent`.  For example, calling
>  ``iter_namespaces("zope")`` might yield ``zope.app`` and
>  ``zope.products`` (if they are namespace packages registered in
>  ``sys.namespace_packagess``), but **not** ``zope.foo.bar``.
>  This function is needed to implement ``extend_namespaces()``, but
>  is potentially useful to others.
>
> * ``ImpImporter.iter_modules()`` should be changed to also detect and
>  yield the names of namespace package portions.
>
> In addition to the above changes, the ``zipimport`` importer should
> have its ``iter_modules()`` implementation similarly changed.  (Note:
> current versions of Python implement this via a shim in ``pkgutil``,
> so technically this is also a change to ``pkgutil``.)
>
>
> Implementation Notes
> --------------------
>
> For users, developers, and distributors of namespace packages:
>
> * ``sys.namespace_packages`` is allowed to contain non-existent or
>  not-yet-imported package names; code that uses its contents should
>  not assume that every name in this set is also present in
>  sys.packages or that importing the name will necessarily succeed.
>
> * ``*.ns`` files must be empty or contain only ASCII whitespace
>  characters.  This leaves open the possibility for future extension
>  to the format.
>
> * Files contained within a namespace package portion directory must
>  be *unique* to that portion, so that the portion can be distributed
>  as a vendor package without any filename overlap.  This applies to
>  modules and data files as well as ``*.ns`` files.
>
>  (For ``*.ns`` files themselves, uniqueness can be achieved simply by
>  giving them a name based on the distribution that contains the file,
>  and it is recommended that packaging tools support doing this
>  automatically.)
>
> * Although this PEP supports the use of non-empty ``__init__`` modules
>  in namespace packages, their usage is controversial.  If more than
>  one package portion contains an ``__init__`` module, at most one of
>  them will be executed, possibly leading to silent errors.
>
>  Therefore, if you must include an ``__init__`` module in your
>  namespace package, make sure that it is provided by exactly **one**
>  distribution, and that all other distributions using that module's
>  contents are defined so as to have an installation dependency on
>  the distribution containing the ``__init__`` module.  Otherwise,
>  it may not be present in some installations.
>
>  (Note: for historical reasons, existing namespace packages nearly
>  always include ``__init__`` modules, but they are usually empty
>  except for code to declare the package a namespace.  Under this
>  proposal, these nearly-empty modules could and should be replaced
>  by an empty ``*.ns`` file in the package directory.)
>
> For those implementing PEP 302 importer objects:
>
> * Importers that support the ``iter_modules()`` method and want to add
>  namespace support should modify their ``iter_modules()``
>  method so that it discovers and list namespace packages as well as
>  standard modules and packages.
>
> * For implementation efficiency, an importer is allowed to cache
>  information (such as whether a directory exists and whether an
>  ``__init__`` module is present in it) between the invocation of a
>  ``namespace_subpath()`` call and a subsequent ``find_module()`` call
>  for the same name.
>
>  It should, however, avoid retaining such cached information for any
>  longer than the next method call, and it should also verify that the
>  request is in fact for the same module/package name, as it is not
>  guaranteed that a ``namespace_subpath()`` call will always be
>  followed by a matching ``find_module()`` call.  (After all, an
>  ``__init__`` module may already have been supplied by an earlier
>  importer on the path.)
>
> * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
>  not need to implement ``namespace_subpath()``, because the method
>  is only called on importers corresponding to ``sys.path`` entries.'
>  If a meta importer wishes to support namespace packages, it must
>  do so entirely within its ``find_module()`` implementation.
>
>  Unfortunately, it is unlikely that any such implementation will be
>  able to merge its namespace portions with those of other meta
>  importers or ``sys.path`` importers, so the meaning of "supporting
>  namespace packages" for a meta importer is currently undefined.
>
>  However, since the intended use case for meta importers is to
>  replace Python's normal import process entirely for some subset of
>  modules, and the number of such importers currently implemented is
>  quite small, this seems unlikely to be a big issue in practice.
>
>
> Rejected Alternatives
> =====================
>
> * The original version of this PEP used ``.pkg`` or ``.pth`` files
>  that contained either explicit directories to be added to a
>  package's ``__path__``, or ``*`` to indicate that a package was
>  a namespace.
>
>  But this approach required a more complex change to the importer
>  protocol, the files had to actually be opened and read, and there
>  were no concrete use cases proposed for the additional flexibility
>  specifying explicit paths.
>
> * On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using
>  extra files, namespace packages use a ``__pkg__.py`` file to
>  indicate their namespace-ness, in addition to a (required)
>  ``__init__.py``.
>
>  Unfortunately, this approach solves only one of the `problems with
>  the current approach`_: i.e., having a standard way of declaring and
>  identifying namespace packages.  It does not address the necessity
>  of distributing duplicated files, or filename overlap between
>  distributions.  Further, it does not allow truly-independent
>  namespace portions to exist, since it requires a "defining" portion
>  (the portion containing the single ``__init__`` module) to exist.
>
> * Another approach considered during revisions to this PEP was to
>  simply rename package directories to add a suffix like ``.ns``
>  or ``-ns``, to indicate their namespaced nature.  This would effect
>  a small performance improvement for the initial import of a
>  namespace package, avoid the need to create empty ``*.ns`` files,
>  and even make it clearer that the directory involved is a namespace
>  portion.
>
>  The downsides, however, are also plentiful.  If a package starts
>  its life as a normal package, it must be renamed when it becomes
>  a namespace, with the implied consequences for revision control
>  tools.
>
>  Further, there is an immense body of existing code (including the
>  distutils and many other packaging tools) that expect a package
>  directory's name to be the same as the package name.  And porting
>  existing Python 2.x namespace packages to Python 3 would require
>  widespread directory renaming as well.
>
>  In short, this approach would require a vastly larger number of
>  changes to both the standard library and third-party code, for
>  a tiny potential performance improvement and a small increase in
>  clarity.  It was therefore rejected on "practicality vs. purity"
>  grounds.
>
>
>
> References
> ==========
>
> .. [1] "namespace" vs "module" packages (mailing list thread)
>   (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
>
> .. [2] "PEP \382: Namespace Packages" (mailing list thread)
>   (http://mail.python.org/pipermail/python-dev/2009-April/088087.html)
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
>   Local Variables:
>   mode: indented-text
>   indent-tabs-mode: nil
>   sentence-end-double-space: t
>   fill-column: 70
>   coding: utf-8
>   End:
>

I have some separate comments on this draft that I'll have to
postpone.  In the meantime I have a couple of questions:

1. Should this PEP wait until importlib.__import__ replaces the
builtin __import__?  That will have bearing on where the
implementation takes place.  I'm not sure of the status of that
effort, other than what Brett has reported in the tracker issue
(http://bugs.python.org/issue2377), nor of the timeframe.

2. Should it wait for the work on the import engine (a GSOC project).
It sounds like a PEP is in the works right now.  It may also impact
the implementation of this PEP.

-eric

> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>