[Python-checkins] peps: Fix backward-compatibility hole described by Jeff Hardy in:
phillip.eby
python-checkins at python.org
Wed Jul 20 21:51:33 CEST 2011
http://hg.python.org/peps/rev/a6f02035c66c
changeset: 3904:a6f02035c66c
user: pje
date: Wed Jul 20 14:48:00 2011 -0400
summary:
Fix backward-compatibility hole described by Jeff Hardy in:
http://mail.python.org/pipermail/python-dev/2011-July/112370.html
Using the approach described here:
http://mail.python.org/pipermail/python-dev/2011-July/112374.html
This should now restrict backward-compatibility concerns to tool-support
questions, unless somebody comes up with another way to break it. ;-)
files:
pep-0402.txt | 190 +++++++++++++++++++++++++++-----------
1 files changed, 135 insertions(+), 55 deletions(-)
diff --git a/pep-0402.txt b/pep-0402.txt
--- a/pep-0402.txt
+++ b/pep-0402.txt
@@ -339,17 +339,57 @@
checking for a ``__version__`` or some other attribute), *and* there
is a directory of the same name as the sought-for package on
``sys.path`` somewhere, *and* the package is not actually installed,
-then such code could *perhaps* be fooled into thinking a package is
-installed that really isn't.
+then such code could be fooled into thinking a package is installed
+that really isn't.
-However, even in the rare case where all these conditions line up to
-happen at once, the failure is more likely to be annoying than
-damaging. In most cases, after all, the code will simply fail a
-little later on, when it actually tries to DO something with the
-imported (but empty) module. (And code that checks ``__version__``
-attributes or for the presence of some desired function, class, or
-module in the package will not see a false positive result in the
-first place.)
+For example, suppose someone writes a script (``datagen.py``)
+containing the following code::
+
+ try:
+ import json
+ except ImportError:
+ import simplejson as json
+
+And runs it in a directory laid out like this::
+
+ datagen.py
+ json/
+ foo.js
+ bar.js
+
+If ``import json`` succeeded due to the mere presence of the ``json/``
+subdirectory, the code would incorrectly believe that the ``json``
+module was available, and proceed to fail with an error.
+
+However, we can prevent corner cases like these from arising, simply
+by making one small change to the algorithm presented so far. Instead
+of allowing you to import a "pure virtual" package (like ``zc``),
+we allow only importing of the *contents* of virtual packages.
+
+That is, a statement like ``import zc`` should raise ``ImportError``
+if there is no ``zc.py`` or ``zc/__init__.py`` on ``sys.path``. But,
+doing ``import zc.buildout`` should still succeed, as long as there's
+a ``zc/buildout.py`` or ``zc/buildout/__init__.py`` on ``sys.path``.
+
+In other words, we don't allow pure virtual packages to be imported
+directly, only modules and self-contained packages. (This is an
+acceptable limitation, because there is no *functional* value to
+importing such a package by itself. After all, the module object
+will have no *contents* until you import at least one of its
+subpackages or submodules!)
+
+Once ``zc.buildout`` has been successfully imported, though, there
+*will* be a ``zc`` module in ``sys.modules``, and trying to import it
+will of course succeed. We are only preventing an *initial* import
+from succeeding, in order to prevent false-positive import successes
+when clashing subdirectories are present on ``sys.path``.
+
+So, with this slight change, the ``datagen.py`` example above will
+work correctly. When it does ``import json``, the mere presence of a
+``json/`` directory will simply not affect the import process at all,
+even if it contains ``.py`` files. The ``json/`` directory will still
+only be searched in the case where an import like ``import
+json.converter`` is attempted.
Meanwhile, tools that expect to locate packages and modules by
walking a directory tree can be updated to use the existing
@@ -361,41 +401,54 @@
Specification
=============
-Two changes are made to the existing import process.
+A change is made to the existing import process, when importing
+names containing at least one ``.`` -- that is, imports of modules
+that have a parent package.
-First, the built-in ``__import__`` function must not raise an
-``ImportError`` when importing a submodule of a module with no
-``__path__``. Instead, it must attempt to *create* a ``__path__``
-attribute for the parent module first, as described in `__path__
-creation`_, below.
+Specifically, if the parent package does not exist, or exists but
+lacks a ``__path__`` attribute, an attempt is first made to create a
+"virtual path" for the parent package (following the algorithm
+described in the section on `virtual paths`_, below).
-Second, if searching ``sys.meta_path`` and ``sys.path`` (or a parent
-package ``__path__``) fails to find a module being imported, the
-import process must attempt to create a ``__path__`` attribute for
-the missing module. If the attempt succeeds, an empty module is
-created and its ``__path__`` is set. Otherwise, importing fails.
+If the computed "virtual path" is empty, an ``ImportError`` results,
+just as it would today. However, if a non-empty virtual path is
+obtained, the normal import of the submodule or subpackage proceeds,
+using that virtual path to find the submodule or subpackage. (Just
+as it would have with the parent's ``__path__``, if the parent package
+had existed and had a ``__path__``.)
-In both of the above cases, if a non-empty ``__path__`` is created,
-the name of the module whose ``__path__`` was created is added to
-``sys.virtual_packages`` -- an initially-empty ``set()`` of package
-names.
+When a submodule or subpackage is found (but not yet loaded),
+the parent package is created and added to ``sys.modules`` (if it
+didn't exist before), and its ``__path__`` is set to the computed
+virtual path (if it wasn't already set).
-(This way, code that extends ``sys.path`` at runtime can find out
-what virtual packages are currently imported, and thereby add any
-new subdirectories to those packages' ``__path__`` attributes. See
-`Standard Library Changes/Additions`_ below for more details.)
+In this way, when the actual loading of the submodule or subpackage
+occurs, it will see a parent package existing, and any relative
+imports will work correctly. However, if no submodule or subpackage
+exists, then the parent package will *not* be created, nor will a
+standalone module be converted into a package (by the addition of a
+spurious ``__path__`` attribute).
-Conversely, if an empty ``__path__`` results, an ``ImportError``
-is immediately raised, and the module is not created or changed, nor
-is its name added to ``sys.virtual_packages``.
+Note, by the way, that this change must be applied *recursively*: that
+is, if ``foo`` and ``foo.bar`` are pure virtual packages, then
+``import foo.bar.baz`` must wait until ``foo.bar.baz`` is found before
+creating module objects for *both* ``foo`` and ``foo.bar``, and then
+create both of them together, properly setting the ``foo`` module's
+``.bar`` attrbute to point to the ``foo.bar``module.
+In this way, pure virtual packages are never directly importable:
+an ``import foo`` or ``import foo.bar`` by itself will fail, and the
+corresponding modules will not appear in ``sys.modules`` until they
+are needed to point to a *successfully* imported submodule or
+self-contained subpackage.
-``__path__`` Creation
----------------------
-A virtual ``__path__`` is created by obtaining a PEP 302 "importer"
-object for each of the path entries found in ``sys.path`` (for a
-top-level module) or the parent ``__path__`` (for a submodule).
+Virtual Paths
+-------------
+
+A virtual path is created by obtaining a PEP 302 "importer" object for
+each of the path entries found in ``sys.path`` (for a top-level
+module) or the parent ``__path__`` (for a submodule).
(Note: because ``sys.meta_path`` importers are not associated with
``sys.path`` or ``__path__`` entry strings, such importers do *not*
@@ -403,18 +456,34 @@
Each importer is checked for a ``get_subpath()`` method, and if
present, the method is called with the full name of the module/package
-the ``__path__`` is being constructed for. The return value is either
-a string representing a subdirectory for the requested package, or
+the path is being constructed for. The return value is either a
+string representing a subdirectory for the requested package, or
``None`` if no such subdirectory exists.
-The strings returned by the importers are added to the ``__path__``
+The strings returned by the importers are added to the path list
being built, in the same order as they are found. (``None`` values
and missing ``get_subpath()`` methods are simply skipped.)
-In Python code, the algorithm would look something like this::
+The resulting list (whether empty or not) is then stored in a
+``sys.virtual_package_paths`` dictionary, keyed by module name.
+
+This dictionary has two purposes. First, it serves as a cache, in
+the event that more than one attempt is made to import a submodule
+of a virtual package.
+
+Second, and more importantly, the dictionary can be used by code that
+extends ``sys.path`` at runtime to *update* imported packages'
+``__path__`` attributes accordingly. (See `Standard Library
+Changes/Additions`_ below for more details.)
+
+In Python code, the virtual path construction algorithm would look
+something like this::
def get_virtual_path(modulename, parent_path=None):
+ if modulename in sys.virtual_package_paths:
+ return sys.virtual_package_paths[modulename]
+
if parent_path is None:
parent_path = sys.path
@@ -429,6 +498,7 @@
if subpath is not None:
path.append(subpath)
+ sys.virtual_package_paths[modulename] = path
return path
And a function like this one should be exposed in the standard
@@ -453,19 +523,25 @@
path.
The implementation of this function does a simple top-down traversal
- of ``sys.virtual_packages``, and performs any necessary
- ``get_subpath()`` calls to identify what path entries need to
- be added to each package's ``__path__``, given that `path_entry`
+ of ``sys.virtual_package_paths``, and performs any necessary
+ ``get_subpath()`` calls to identify what path entries need to be
+ added to the virtual path for that package, given that `path_entry`
has been added to ``sys.path``. (Or, in the case of sub-packages,
- adding a derived subpath entry, based on their parent namespace's
- ``__path__``.)
+ adding a derived subpath entry, based on their parent package's
+ virtual path.)
+
+ (Note: this function must update both the path values in
+ ``sys.virtual_package_paths`` as well as the ``__path__`` attributes
+ of any corresponding modules in ``sys.modules``, even though in the
+ common case they will both be the same ``list`` object.)
* A new ``iter_virtual_packages(parent='')`` function to allow
- top-down traversal of virtual packages in ``sys.virtual_packages``,
- by yielding the child virtual packages of `parent`. For example,
- calling ``iter_virtual_packages("zope")`` might yield ``zope.app``
- and ``zope.products`` (if they are imported virtual packages listed
- in ``sys.virtual_packages``), but **not** ``zope.foo.bar``.
+ top-down traversal of virtual packages from
+ ``sys.virtual_package_paths``, by yielding the child virtual
+ packages of `parent`. For example, calling
+ ``iter_virtual_packages("zope")`` might yield ``zope.app``
+ and ``zope.products`` (if they are virtual packages listed in
+ ``sys.virtual_package_paths``), but **not** ``zope.foo.bar``.
(This function is needed to implement ``extend_virtual_paths()``,
but is also potentially useful for other code that needs to inspect
imported virtual packages.)
@@ -500,10 +576,11 @@
and do other things that make more sense for a self-contained
project than for a mere "namespace" package.
-* ``sys.virtual_packages`` is allowed to contain non-existent or
- not-yet-imported package names; code that uses its contents should
- not assume that every name in this set is also present in
- ``sys.modules`` or that importing the name will necessarily succeed.
+* ``sys.virtual_package_paths`` is allowed to contain entries for
+ non-existent or not-yet-imported package names; code that uses its
+ contents should not assume that every key in this dictionary is also
+ present in ``sys.modules`` or that importing the name will
+ necessarily succeed.
* If you are changing a currently self-contained package into a
virtual one, it's important to note that you can no longer use its
@@ -539,7 +616,9 @@
XXX This might list a lot of not-really-packages. Should we
require importable contents to exist? If so, how deep do we
search, and how do we prevent e.g. link loops, or traversing onto
- different filesystems, etc.? Ick.
+ different filesystems, etc.? Ick. Also, if virtual packages are
+ listed, they still can't be *imported*, which is a problem for the
+ way that ``pkgutil.walk_modules()`` is currently implemented.
* "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
not need to implement ``get_subpath()``, because the method
--
Repository URL: http://hg.python.org/peps
More information about the Python-checkins
mailing list