[Import-SIG] New draft revision for PEP 382
Eric Snow
ericsnowcurrently at gmail.com
Fri Jul 8 23:52:49 CEST 2011
On Fri, Jul 8, 2011 at 1:51 PM, P.J. Eby <pje at telecommunity.com> wrote:
> The following is my attempt at an updated draft of PEP 382, based on the
> recently-discussed changes.
>
> To address the questions and criticisms raisd on Python-Dev when the PEP was
> introduced, I added an extended "Motivation" section that explains issues
> with the current approaches, and states the case for the PEP in more detail,
> including info about why anyone should care about namespace packages in the
> first place. ;-)
>
> I've also added a "Rejected Alternatives" section to document the other
> proposed approaches and the rationale for rejecting them in favor of the
> current proposal.
>
> In addition, I've specified in a bit more detail the necessary changes to
> e.g. the pkgutil module. (At least one open issue remains, however, and
> that is the question of what, if anything, should happen to the existing
> extend_path() function. A second possible open question regards the API of
> the path fixup functions I propose in pkgutil.)
>
> Anyway, your questions and comments, please! The draft follows below:
>
>
> PEP: 382
> Title: Namespace Package Declarations
> Version: $Revision$
> Last-Modified: $Date$
> Author: Martin v. Löwis <martin at v.loewis.de>, PJ Eby
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 02-Apr-2009
> Python-Version: 3.2
> Post-History:
>
> Abstract
> ========
>
> This PEP proposes an enhancement to Python's import machinery to
> replace existing uses of the standard library's
> ``pkgutil.extend_path()`` API, and similar third-party APIs such as
> ``pkg_resources.declare_namespace()``.
>
> The proposed enhancement will improve the reliability of existing
> namespace package implementations, while providing "One Obvious Way"
> to produce and consume namespace packages.
>
>
> Terminology
> ===========
>
> Within this PEP, the following terms are used as follows:
>
> Package
> Python packages as defined by Python's import statement.
>
> Distribution
> A separately installable set of Python modules, as registered in
> the Python package index, and installed by distutils, setuptools,
> etc.
>
> Vendor Package
> A group of files installed by an operating system's packaging
> mechanism (e.g. Debian or Redhat packages installed on Linux
> systems).
>
> Portion
> A set of files in a single directory (possibly inside a zip file
> or other storage mechanism) that contribute modules or subpackages
> to a namespace package. The contents of each portion ``sys.path``
>
> Namespace Package
> A package whose subpackages and modules can be split into portions
> that can be distributed or installed separately (via separate
> distributions and/or vendor packages), in shared or separate
> installation locations.
>
> Unlike a regular package, however, which only allows submodule
> and subpackage imports from a single location, a namespace
> package's ``__path__`` is configured so that submodules and
> subpackages can be imported from each of its installed portions,
> regardless of their relative positions in ``sys.path``.
>
>
> Motivation
> ==========
>
> .. epigraph::
>
> "Most packages are like modules. Their contents are highly
> interdependent and can't be pulled apart. [However,] some
> packages exist to provide a separate namespace. ... It should
> be possible to distribute sub-packages or submodules of these
> [namespace packages] independently."
>
> -- Jim Fulton, shortly before the release of Python 2.3 [1]_
>
This is a really helpful addition.
>
> The Current Approach
> --------------------
>
> First introduced in Python 2.3, namespace packages are a mechanism
> for splitting a single Python package across multiple directories
> on disk. This splitting has two main benefits:
>
> 1. It allows different parts of a large package or framework to be
> distributed and installed independently. For example, installing
> the ``zope.interface`` package without having to install every
> package in the ``zope.*`` namespace.
>
> (This is somewhat similar to the way Perl's package system allows
> authors to separately distribute subpackages of ``File::`` or
> ``Email::``.)
>
> 2. As a side-effect of benefit 1, it reduces package naming collisions
> across multiple authors or organizations, by encouraging them to
> use distinguishing prefixes. Instead of say, Zope and Twisted both
> offering a top-level ``interface`` package (in which case, both
> could not be installed to the same directory), they can use
> ``zope.interface`` and ``twisted.interface``, while still being
> able to distribute these subpackages separately from other ``zope``
> or ``twisted`` subpackages.
>
> (This is somewhat similar to the way Java uses names like
> ``org.apache.foobar`` or ``com.sun.thingy`` to prevent collisions,
> only flatter.)
>
> In current Python versions, however, a registration function (such as
> ``pkgutil.extend_path()`` or ``pkg_resources.declare_namespace()``)
> must be explicitly invoked in order to set up the package's
> ``__path__``.
>
> There are two problems with this approach, however.
>
>
> Problems With The Current Approach
> ----------------------------------
>
> The first (and lesser) problem is that there is no One Obvious Way to
> either declare that a package is a "namespace" or "module" package,
> or to tell which kind of package a given directory on disk is.
>
> Instead, you must choose one of the various APIs to use, each of
> which is slightly-incompatible with the others. (For example,
> ``pkgutil`` supports ``*.pkg`` files; setuptools doesn't. Likewise,
> setuptools supports package portions living in zip files, and adding
> new path components to already-imported namespaces, whereas
> ``pkgutil`` doesn't.)
>
> Similarly, to tell whether a given directory is a "namespace" or
> "module" package, you must read its documentation or inspect its code
> in detail, and be able to recognize the various API calls mentioned
> above.
>
> The second -- and much larger -- issue is that whichever API is used
> to declare the namespace, the declaration has to be invoked from a
> namespace package's ``__init__`` module in order to work. (Otherwise,
> only the first part of the package found on ``sys.path`` would be
> importable.)
>
> This clashes with the goal of separately installing portions of a
> namespace, because then each distributed piece must include a copy
> of the same ``__init__.py``. (Otherwise, each piece would not be
> importable on its own, as Python currently requires the existence
> of an ``__init__`` module in order to import the package at all, let
> alone set up the namespace!)
>
> In addition to the developer inconvenience of creating, synchronizing,
> and distributing these duplicated ``__init__`` modules, there is a
> further problem created for operating system vendors.
>
> Vendor packages typically must not provide overlapping files, and an
> attempt to install a vendor package that has a file already on disk
> will fail or cause unpredictable behavior. As vendors might choose to
> package distributions such that they will end up all in a single
> directory for the namespace package, all portions would contribute
> conflicting ``__init__.py`` files.
>
> This issue has lead to various fragile and complex workarounds in
> practice, such as ``.pth`` file abuse by setuptools, and the shipping
> of broken partial packages with distutils.
>
> With the enhancement proposed here, however, all of the above problems
> can be readily resolved.
>
>
> Specification
> =============
>
> Instead of an API call buried inside a series of duplicated and
> potentially-clashing ``__init__`` modules (which mostly exist only
> to make the package importable and declare its namespace-ness), this
> PEP proposes that Python's import machinery be modified to include
> direct support for namespace packages.
>
> This support would work by adding a new way to desginate a directory
> as containing a namespace package portion: by including one or more
> ``*.ns`` files in it.
>
> This approach removes the need for an ``__init__`` module to be
> duplicated across namespace package portions. Instead, each portion
> can simply include a uniquely-named ``*.ns`` file, thereby avoiding
> filename clashes in vendor packages.
>
> And, since the import machinery knows that these directories are
> portions of a namespace package, it can automatically initialize
> the package's ``__path__`` to include portions located on different
> parts of ``sys.path``. (Thus avoiding the need for special code
> to be called in the ``__init__`` module.)
>
> In addition to doing this path setup, the import machinery will also
> add any imported namespace packages to ``sys.namespace_packages``
> (initially an empty set), so that namespace packages can be identified
> or iterated over.
>
>
> PEP \302 Extension
> ------------------
>
> The existing PEP 302 protocol is to be extended to handle namespace
> package portion directories, by adding a new importer method,
> ``namespace_subpath(fullname)``. An implementation of this method
> will be added to all applicable importer classes distributed with
> Python, including those in ``pkgutil`` and ``zipimport``).
>
> (Note: any other importer wishing to support namespace packages must
> provide its own implementation of this method as well. If an importer
> does not have a ``namespace_subpath()`` method, it will be treated as
> if it *did* have the method, but it returned ``None`` when called.)
>
> This new method is called just before the importer's ``find_module()``
> is normally invoked. If the importer determines that `fullname` is
> a namespace package portion under its jurisdiction, then the importer
> returns an importer-specific path to that namespace portion.
>
> For example, if a standard filesystem path importer for the path
> ``/usr/lib/site-packages`` is about to be asked to import ``zope``,
> and there is a ``/usr/lib/site-packages/zope`` directory containing
> any files ending with ``.ns``, a call to ``namespace_subpath("zope")``
> on that importer should return ``"/usr/lib/site-packages/zope"``.
>
> However, if there is no such subdirectory, or it does *not* contain
> any files whose names end with ``.ns``, that importer would return
> ``None`` instead.
>
> The Python import machinery will call this method on each importer
> corresponding to a path entry in ``sys.path`` (for top-level imports)
> or in a parent package ``__path__`` (for subpackage imports).
>
> If a normal package or module is found before a namespace package,
> importing proceeds according to the normal PEP 302 protocol. (That
> is, a loader object is simply asked to load the located module or
> package.)
>
> However, if a namespace package portion is found (i.e., an importer's
> ``namespace_subpath()`` returns a string), then the normal import
> search stops, and a namespace package is created instead.
>
> The import machinery continues iterating over importers and calling
> ``namespace_subpath()`` on them, but it does **not** continue calling
> ``find_module()`` on them. Instead, it accumulates any strings
> returned by the subpath calls, in order to assemble a ``__path__``
> for the package being imported.
>
> (Note that this implies that any non-namespace packages with the same
> name are skipped, and not included in the resulting package's
> ``__path__``. In other words, a namespace package's initial
> ``__path__`` only includes namespace portions, never non-namespace
> package directories.)
>
> Once this ``__path__`` has been assembled, a module is created, and
> its ``__path__`` attribute is set. The package's name is then added
> to ``sys.namespace_packages`` -- a set of package names.
>
> Finally, the ``__init__`` module code for the package (if it exists)
> is located and executed in the new module's namespace.
>
> Each importer that returns a ``namespace_subpath()`` for the package
> is asked to perform a standard ``find_module()`` for the package.
> Since by the normal import rules, a directory containing an
> ``__init__`` module is a package, this call should succeed if the
> namespace package portion contains an ``__init__`` module, and the
> importing can proceed normally from that point.
>
> There is one caveat, however. The importers currently distributed
> with Python expect that *they* will be the ones to initialize the
> ``__path__`` attribute, which means that they must be changed to
> either recognize that ``__path__`` has already been set and not
> change it, or to handle namespace packages specially (e.g., via an
> internal flag or checking ``sys.namespace_packages``).
>
> Similarly, any third-party importers wishing to support namespace
> packages must make similar changes.
>
> (NOTE: in general, it goes against the design of PEP 302 for a loader
> object to assume that it is always creating the module object or that
> the module it is operating on is empty. Making this assumption can
> result in code that breaks the normal operation of the ``reload()``
> builtin and any specialized tools that rely on it, such as lazy
> importers, automatic reloaders, and so on.)
>
>
> Standard Library Changes/Additions
> ----------------------------------
>
> The ``pkgutil`` module should be updated to handle this
> specification appropriately, including any necessary changes to
> ``extend_path()``, ``iter_modules()``, etc. A new generic API for
> calling ``namespace_subpath()`` on importers should be added as well.
>
> Specifically the proposed changes and additions are:
>
> * A new ``namespace_subpath(importer, fullname)`` generic, allowing
> implementations to be registered for existing importers.
>
> * A new ``extend_namespaces(path_entry)`` function, to extend existing
> and already-imported namespace packages' ``__path__`` attributes to
> include any portions found in a new ``sys.path`` entry. This
> function should be called by applications extending ``sys.path``
> at runtime, e.g. to include a plugin directory or add an egg to the
> path.
>
> The implementation of this function does a simple breadth-first walk
> of ``sys.namespace_packages``, and performs any necessary
> ``namespace_subpath()`` calls to identify what path entries need to
> be added to each package's ``__path__``, given that `path_entry`
> has been added to ``sys.path``.
>
> * A new ``iter_namespaces(parent='')`` function to allow breadth-first
> traversal of namespaces in ``sys.namespace_packages``, by yielding
> the child namespace packages of `parent`. For example, calling
> ``iter_namespaces("zope")`` might yield ``zope.app`` and
> ``zope.products`` (if they are namespace packages registered in
> ``sys.namespace_packagess``), but **not** ``zope.foo.bar``.
> This function is needed to implement ``extend_namespaces()``, but
> is potentially useful to others.
>
> * ``ImpImporter.iter_modules()`` should be changed to also detect and
> yield the names of namespace package portions.
>
> In addition to the above changes, the ``zipimport`` importer should
> have its ``iter_modules()`` implementation similarly changed. (Note:
> current versions of Python implement this via a shim in ``pkgutil``,
> so technically this is also a change to ``pkgutil``.)
>
>
> Implementation Notes
> --------------------
>
> For users, developers, and distributors of namespace packages:
>
> * ``sys.namespace_packages`` is allowed to contain non-existent or
> not-yet-imported package names; code that uses its contents should
> not assume that every name in this set is also present in
> sys.packages or that importing the name will necessarily succeed.
>
> * ``*.ns`` files must be empty or contain only ASCII whitespace
> characters. This leaves open the possibility for future extension
> to the format.
>
> * Files contained within a namespace package portion directory must
> be *unique* to that portion, so that the portion can be distributed
> as a vendor package without any filename overlap. This applies to
> modules and data files as well as ``*.ns`` files.
>
> (For ``*.ns`` files themselves, uniqueness can be achieved simply by
> giving them a name based on the distribution that contains the file,
> and it is recommended that packaging tools support doing this
> automatically.)
>
> * Although this PEP supports the use of non-empty ``__init__`` modules
> in namespace packages, their usage is controversial. If more than
> one package portion contains an ``__init__`` module, at most one of
> them will be executed, possibly leading to silent errors.
>
> Therefore, if you must include an ``__init__`` module in your
> namespace package, make sure that it is provided by exactly **one**
> distribution, and that all other distributions using that module's
> contents are defined so as to have an installation dependency on
> the distribution containing the ``__init__`` module. Otherwise,
> it may not be present in some installations.
>
> (Note: for historical reasons, existing namespace packages nearly
> always include ``__init__`` modules, but they are usually empty
> except for code to declare the package a namespace. Under this
> proposal, these nearly-empty modules could and should be replaced
> by an empty ``*.ns`` file in the package directory.)
>
> For those implementing PEP 302 importer objects:
>
> * Importers that support the ``iter_modules()`` method and want to add
> namespace support should modify their ``iter_modules()``
> method so that it discovers and list namespace packages as well as
> standard modules and packages.
>
> * For implementation efficiency, an importer is allowed to cache
> information (such as whether a directory exists and whether an
> ``__init__`` module is present in it) between the invocation of a
> ``namespace_subpath()`` call and a subsequent ``find_module()`` call
> for the same name.
>
> It should, however, avoid retaining such cached information for any
> longer than the next method call, and it should also verify that the
> request is in fact for the same module/package name, as it is not
> guaranteed that a ``namespace_subpath()`` call will always be
> followed by a matching ``find_module()`` call. (After all, an
> ``__init__`` module may already have been supplied by an earlier
> importer on the path.)
>
> * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
> not need to implement ``namespace_subpath()``, because the method
> is only called on importers corresponding to ``sys.path`` entries.'
> If a meta importer wishes to support namespace packages, it must
> do so entirely within its ``find_module()`` implementation.
>
> Unfortunately, it is unlikely that any such implementation will be
> able to merge its namespace portions with those of other meta
> importers or ``sys.path`` importers, so the meaning of "supporting
> namespace packages" for a meta importer is currently undefined.
>
> However, since the intended use case for meta importers is to
> replace Python's normal import process entirely for some subset of
> modules, and the number of such importers currently implemented is
> quite small, this seems unlikely to be a big issue in practice.
>
>
> Rejected Alternatives
> =====================
>
> * The original version of this PEP used ``.pkg`` or ``.pth`` files
> that contained either explicit directories to be added to a
> package's ``__path__``, or ``*`` to indicate that a package was
> a namespace.
>
> But this approach required a more complex change to the importer
> protocol, the files had to actually be opened and read, and there
> were no concrete use cases proposed for the additional flexibility
> specifying explicit paths.
>
> * On Python-Dev, M.A. Lemburg proposed [2]_ that instead of using
> extra files, namespace packages use a ``__pkg__.py`` file to
> indicate their namespace-ness, in addition to a (required)
> ``__init__.py``.
>
> Unfortunately, this approach solves only one of the `problems with
> the current approach`_: i.e., having a standard way of declaring and
> identifying namespace packages. It does not address the necessity
> of distributing duplicated files, or filename overlap between
> distributions. Further, it does not allow truly-independent
> namespace portions to exist, since it requires a "defining" portion
> (the portion containing the single ``__init__`` module) to exist.
>
> * Another approach considered during revisions to this PEP was to
> simply rename package directories to add a suffix like ``.ns``
> or ``-ns``, to indicate their namespaced nature. This would effect
> a small performance improvement for the initial import of a
> namespace package, avoid the need to create empty ``*.ns`` files,
> and even make it clearer that the directory involved is a namespace
> portion.
>
> The downsides, however, are also plentiful. If a package starts
> its life as a normal package, it must be renamed when it becomes
> a namespace, with the implied consequences for revision control
> tools.
>
> Further, there is an immense body of existing code (including the
> distutils and many other packaging tools) that expect a package
> directory's name to be the same as the package name. And porting
> existing Python 2.x namespace packages to Python 3 would require
> widespread directory renaming as well.
>
> In short, this approach would require a vastly larger number of
> changes to both the standard library and third-party code, for
> a tiny potential performance improvement and a small increase in
> clarity. It was therefore rejected on "practicality vs. purity"
> grounds.
>
>
>
> References
> ==========
>
> .. [1] "namespace" vs "module" packages (mailing list thread)
> (http://mail.zope.org/pipermail/zope3-dev/2002-December/004251.html)
>
> .. [2] "PEP \382: Namespace Packages" (mailing list thread)
> (http://mail.python.org/pipermail/python-dev/2009-April/088087.html)
>
> Copyright
> =========
>
> This document has been placed in the public domain.
>
>
> ..
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> sentence-end-double-space: t
> fill-column: 70
> coding: utf-8
> End:
>
I have some separate comments on this draft that I'll have to
postpone. In the meantime I have a couple of questions:
1. Should this PEP wait until importlib.__import__ replaces the
builtin __import__? That will have bearing on where the
implementation takes place. I'm not sure of the status of that
effort, other than what Brett has reported in the tracker issue
(http://bugs.python.org/issue2377), nor of the timeframe.
2. Should it wait for the work on the import engine (a GSOC project).
It sounds like a PEP is in the works right now. It may also impact
the implementation of this PEP.
-eric
> _______________________________________________
> Import-SIG mailing list
> Import-SIG at python.org
> http://mail.python.org/mailman/listinfo/import-sig
>
More information about the Import-SIG
mailing list