[Import-SIG] New draft revision for PEP 382
pje at telecommunity.com
Fri Jul 8 21:51:39 CEST 2011
The following is my attempt at an updated draft of PEP 382, based on
the recently-discussed changes.
To address the questions and criticisms raisd on Python-Dev when the
PEP was introduced, I added an extended "Motivation" section that
explains issues with the current approaches, and states the case for
the PEP in more detail, including info about why anyone should care
about namespace packages in the first place. ;-)
I've also added a "Rejected Alternatives" section to document the
other proposed approaches and the rationale for rejecting them in
favor of the current proposal.
In addition, I've specified in a bit more detail the necessary
changes to e.g. the pkgutil module. (At least one open issue
remains, however, and that is the question of what, if anything,
should happen to the existing extend_path() function. A second
possible open question regards the API of the path fixup functions I
propose in pkgutil.)
Anyway, your questions and comments, please! The draft follows below:
Title: Namespace Package Declarations
Author: Martin v. LÃ¶wis <martin at v.loewis.de>, PJ Eby
Type: Standards Track
This PEP proposes an enhancement to Python's import machinery to
replace existing uses of the standard library's
``pkgutil.extend_path()`` API, and similar third-party APIs such as
The proposed enhancement will improve the reliability of existing
namespace package implementations, while providing "One Obvious Way"
to produce and consume namespace packages.
Within this PEP, the following terms are used as follows:
Python packages as defined by Python's import statement.
A separately installable set of Python modules, as registered in
the Python package index, and installed by distutils, setuptools,
A group of files installed by an operating system's packaging
mechanism (e.g. Debian or Redhat packages installed on Linux
A set of files in a single directory (possibly inside a zip file
or other storage mechanism) that contribute modules or subpackages
to a namespace package. The contents of each portion ``sys.path``
A package whose subpackages and modules can be split into portions
that can be distributed or installed separately (via separate
distributions and/or vendor packages), in shared or separate
Unlike a regular package, however, which only allows submodule
and subpackage imports from a single location, a namespace
package's ``__path__`` is configured so that submodules and
subpackages can be imported from each of its installed portions,
regardless of their relative positions in ``sys.path``.
"Most packages are like modules. Their contents are highly
interdependent and can't be pulled apart. [However,] some
packages exist to provide a separate namespace. ... It should
be possible to distribute sub-packages or submodules of these
[namespace packages] independently."
-- Jim Fulton, shortly before the release of Python 2.3 _
The Current Approach
First introduced in Python 2.3, namespace packages are a mechanism
for splitting a single Python package across multiple directories
on disk. This splitting has two main benefits:
1. It allows different parts of a large package or framework to be
distributed and installed independently. For example, installing
the ``zope.interface`` package without having to install every
package in the ``zope.*`` namespace.
(This is somewhat similar to the way Perl's package system allows
authors to separately distribute subpackages of ``File::`` or
2. As a side-effect of benefit 1, it reduces package naming collisions
across multiple authors or organizations, by encouraging them to
use distinguishing prefixes. Instead of say, Zope and Twisted both
offering a top-level ``interface`` package (in which case, both
could not be installed to the same directory), they can use
``zope.interface`` and ``twisted.interface``, while still being
able to distribute these subpackages separately from other ``zope``
or ``twisted`` subpackages.
(This is somewhat similar to the way Java uses names like
``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions,
In current Python versions, however, a registration function (such as
``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
must be explicitly invoked in order to set up the package's
There are two problems with this approach, however.
Problems With The Current Approach
The first (and lesser) problem is that there is no One Obvious Way to
either declare that a package is a "namespace" or "module" package,
or to tell which kind of package a given directory on disk is.
Instead, you must choose one of the various APIs to use, each of
which is slightly-incompatible with the others. (For example,
``pkgutil`` supports ``*.pkg`` files; setuptools doesn't. Likewise,
setuptools supports package portions living in zip files, and adding
new path components to already-imported namespaces, whereas
Similarly, to tell whether a given directory is a "namespace" or
"module" package, you must read its documentation or inspect its code
in detail, and be able to recognize the various API calls mentioned
The second -- and much larger -- issue is that whichever API is used
to declare the namespace, the declaration has to be invoked from a
namespace package's ``__init__`` module in order to work. (Otherwise,
only the first part of the package found on ``sys.path`` would be
This clashes with the goal of separately installing portions of a
namespace, because then each distributed piece must include a copy
of the same ``__init__.py``. (Otherwise, each piece would not be
importable on its own, as Python currently requires the existence
of an ``__init__`` module in order to import the package at all, let
alone set up the namespace!)
In addition to the developer inconvenience of creating, synchronizing,
and distributing these duplicated ``__init__`` modules, there is a
further problem created for operating system vendors.
Vendor packages typically must not provide overlapping files, and an
attempt to install a vendor package that has a file already on disk
will fail or cause unpredictable behavior. As vendors might choose to
package distributions such that they will end up all in a single
directory for the namespace package, all portions would contribute
conflicting ``__init__.py`` files.
This issue has lead to various fragile and complex workarounds in
practice, such as ``.pth`` file abuse by setuptools, and the shipping
of broken partial packages with distutils.
With the enhancement proposed here, however, all of the above problems
can be readily resolved.
Instead of an API call buried inside a series of duplicated and
potentially-clashing ``__init__`` modules (which mostly exist only
to make the package importable and declare its namespace-ness), this
PEP proposes that Python's import machinery be modified to include
direct support for namespace packages.
This support would work by adding a new way to desginate a directory
as containing a namespace package portion: by including one or more
``*.ns`` files in it.
This approach removes the need for an ``__init__`` module to be
duplicated across namespace package portions. Instead, each portion
can simply include a uniquely-named ``*.ns`` file, thereby avoiding
filename clashes in vendor packages.
And, since the import machinery knows that these directories are
portions of a namespace package, it can automatically initialize
the package's ``__path__`` to include portions located on different
parts of ``sys.path``. (Thus avoiding the need for special code
to be called in the ``__init__`` module.)
In addition to doing this path setup, the import machinery will also
add any imported namespace packages to ``sys.namespace_packages``
(initially an empty set), so that namespace packages can be identified
or iterated over.
PEP \302 Extension
The existing PEP 302 protocol is to be extended to handle namespace
package portion directories, by adding a new importer method,
``namespace_subpath(fullname)``. An implementation of this method
will be added to all applicable importer classes distributed with
Python, including those in ``pkgutil`` and ``zipimport``).
(Note: any other importer wishing to support namespace packages must
provide its own implementation of this method as well. If an importer
does not have a ``namespace_subpath()`` method, it will be treated as
if it *did* have the method, but it returned ``None`` when called.)
This new method is called just before the importer's ``find_module()``
is normally invoked. If the importer determines that `fullname` is
a namespace package portion under its jurisdiction, then the importer
returns an importer-specific path to that namespace portion.
For example, if a standard filesystem path importer for the path
``/usr/lib/site-packages`` is about to be asked to import ``zope``,
and there is a ``/usr/lib/site-packages/zope`` directory containing
any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
on that importer should return ``"/usr/lib/site-packages/zope"``.
However, if there is no such subdirectory, or it does *not* contain
any files whose names end with ``.ns``, that importer would return
The Python import machinery will call this method on each importer
corresponding to a path entry in ``sys.path`` (for top-level imports)
or in a parent package ``__path__`` (for subpackage imports).
If a normal package or module is found before a namespace package,
importing proceeds according to the normal PEP 302 protocol. (That
is, a loader object is simply asked to load the located module or
However, if a namespace package portion is found (i.e., an importer's
``namespace_subpath()`` returns a string), then the normal import
search stops, and a namespace package is created instead.
The import machinery continues iterating over importers and calling
``namespace_subpath()`` on them, but it does **not** continue calling
``find_module()`` on them. Instead, it accumulates any strings
returned by the subpath calls, in order to assemble a ``__path__``
for the package being imported.
(Note that this implies that any non-namespace packages with the same
name are skipped, and not included in the resulting package's
``__path__``. In other words, a namespace package's initial
``__path__`` only includes namespace portions, never non-namespace
Once this ``__path__`` has been assembled, a module is created, and
its ``__path__`` attribute is set. The package's name is then added
to ``sys.namespace_packages`` -- a set of package names.
Finally, the ``__init__`` module code for the package (if it exists)
is located and executed in the new module's namespace.
Each importer that returns a ``namespace_subpath()`` for the package
is asked to perform a standard ``find_module()`` for the package.
Since by the normal import rules, a directory containing an
``__init__`` module is a package, this call should succeed if the
namespace package portion contains an ``__init__`` module, and the
importing can proceed normally from that point.
There is one caveat, however. The importers currently distributed
with Python expect that *they* will be the ones to initialize the
``__path__`` attribute, which means that they must be changed to
either recognize that ``__path__`` has already been set and not
change it, or to handle namespace packages specially (e.g., via an
internal flag or checking ``sys.namespace_packages``).
Similarly, any third-party importers wishing to support namespace
packages must make similar changes.
(NOTE: in general, it goes against the design of PEP 302 for a loader
object to assume that it is always creating the module object or that
the module it is operating on is empty. Making this assumption can
result in code that breaks the normal operation of the ``reload()``
builtin and any specialized tools that rely on it, such as lazy
importers, automatic reloaders, and so on.)
Standard Library Changes/Additions
The ``pkgutil`` module should be updated to handle this
specification appropriately, including any necessary changes to
``extend_path()``, ``iter_modules()``, etc. A new generic API for
calling ``namespace_subpath()`` on importers should be added as well.
Specifically the proposed changes and additions are:
* A new ``namespace_subpath(importer, fullname)`` generic, allowing
implementations to be registered for existing importers.
* A new ``extend_namespaces(path_entry)`` function, to extend existing
and already-imported namespace packages' ``__path__`` attributes to
include any portions found in a new ``sys.path`` entry. This
function should be called by applications extending ``sys.path``
at runtime, e.g. to include a plugin directory or add an egg to the
The implementation of this function does a simple breadth-first walk
of ``sys.namespace_packages``, and performs any necessary
``namespace_subpath()`` calls to identify what path entries need to
be added to each package's ``__path__``, given that `path_entry`
has been added to ``sys.path``.
* A new ``iter_namespaces(parent='')`` function to allow breadth-first
traversal of namespaces in ``sys.namespace_packages``, by yielding
the child namespace packages of `parent`. For example, calling
``iter_namespaces("zope")`` might yield ``zope.app`` and
``zope.products`` (if they are namespace packages registered in
``sys.namespace_packagess``), but **not** ``zope.foo.bar``.
This function is needed to implement ``extend_namespaces()``, but
is potentially useful to others.
* ``ImpImporter.iter_modules()`` should be changed to also detect and
yield the names of namespace package portions.
In addition to the above changes, the ``zipimport`` importer should
have its ``iter_modules()`` implementation similarly changed. (Note:
current versions of Python implement this via a shim in ``pkgutil``,
so technically this is also a change to ``pkgutil``.)
For users, developers, and distributors of namespace packages:
* ``sys.namespace_packages`` is allowed to contain non-existent or
not-yet-imported package names; code that uses its contents should
not assume that every name in this set is also present in
sys.packages or that importing the name will necessarily succeed.
* ``*.ns`` files must be empty or contain only ASCII whitespace
characters. This leaves open the possibility for future extension
to the format.
* Files contained within a namespace package portion directory must
be *unique* to that portion, so that the portion can be distributed
as a vendor package without any filename overlap. This applies to
modules and data files as well as ``*.ns`` files.
(For ``*.ns`` files themselves, uniqueness can be achieved simply by
giving them a name based on the distribution that contains the file,
and it is recommended that packaging tools support doing this
* Although this PEP supports the use of non-empty ``__init__`` modules
in namespace packages, their usage is controversial. If more than
one package portion contains an ``__init__`` module, at most one of
them will be executed, possibly leading to silent errors.
Therefore, if you must include an ``__init__`` module in your
namespace package, make sure that it is provided by exactly **one**
distribution, and that all other distributions using that module's
contents are defined so as to have an installation dependency on
the distribution containing the ``__init__`` module. Otherwise,
it may not be present in some installations.
(Note: for historical reasons, existing namespace packages nearly
always include ``__init__`` modules, but they are usually empty
except for code to declare the package a namespace. Under this
proposal, these nearly-empty modules could and should be replaced
by an empty ``*.ns`` file in the package directory.)
For those implementing PEP 302 importer objects:
* Importers that support the ``iter_modules()`` method and want to add
namespace support should modify their ``iter_modules()``
method so that it discovers and list namespace packages as well as
standard modules and packages.
* For implementation efficiency, an importer is allowed to cache
information (such as whether a directory exists and whether an
``__init__`` module is present in it) between the invocation of a
``namespace_subpath()`` call and a subsequent ``find_module()`` call
for the same name.
It should, however, avoid retaining such cached information for any
longer than the next method call, and it should also verify that the
request is in fact for the same module/package name, as it is not
guaranteed that a ``namespace_subpath()`` call will always be
followed by a matching ``find_module()`` call. (After all, an
``__init__`` module may already have been supplied by an earlier
importer on the path.)
* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
not need to implement ``namespace_subpath()``, because the method
is only called on importers corresponding to ``sys.path`` entries.'
If a meta importer wishes to support namespace packages, it must
do so entirely within its ``find_module()`` implementation.
Unfortunately, it is unlikely that any such implementation will be
able to merge its namespace portions with those of other meta
importers or ``sys.path`` importers, so the meaning of "supporting
namespace packages" for a meta importer is currently undefined.
However, since the intended use case for meta importers is to
replace Python's normal import process entirely for some subset of
modules, and the number of such importers currently implemented is
quite small, this seems unlikely to be a big issue in practice.
* The original version of this PEP used ``.pkg`` or ``.pth`` files
that contained either explicit directories to be added to a
package's ``__path__``, or ``*`` to indicate that a package was
But this approach required a more complex change to the importer
protocol, the files had to actually be opened and read, and there
were no concrete use cases proposed for the additional flexibility
specifying explicit paths.
* On Python-Dev, M.A. Lemburg proposed _ that instead of using
extra files, namespace packages use a ``__pkg__.py`` file to
indicate their namespace-ness, in addition to a (required)
Unfortunately, this approach solves only one of the `problems with
the current approach`_: i.e., having a standard way of declaring and
identifying namespace packages. It does not address the necessity
of distributing duplicated files, or filename overlap between
distributions. Further, it does not allow truly-independent
namespace portions to exist, since it requires a "defining" portion
(the portion containing the single ``__init__`` module) to exist.
* Another approach considered during revisions to this PEP was to
simply rename package directories to add a suffix like ``.ns``
or ``-ns``, to indicate their namespaced nature. This would effect
a small performance improvement for the initial import of a
namespace package, avoid the need to create empty ``*.ns`` files,
and even make it clearer that the directory involved is a namespace
The downsides, however, are also plentiful. If a package starts
its life as a normal package, it must be renamed when it becomes
a namespace, with the implied consequences for revision control
Further, there is an immense body of existing code (including the
distutils and many other packaging tools) that expect a package
directory's name to be the same as the package name. And porting
existing Python 2.x namespace packages to Python 3 would require
widespread directory renaming as well.
In short, this approach would require a vastly larger number of
changes to both the standard library and third-party code, for
a tiny potential performance improvement and a small increase in
clarity. It was therefore rejected on "practicality vs. purity"
..  "namespace" vs "module" packages (mailing list thread)
..  "PEP \382: Namespace Packages" (mailing list thread)
This document has been placed in the public domain.
More information about the Import-SIG