[Python-checkins] peps: Fix backward-compatibility hole described by Jeff Hardy in:

phillip.eby python-checkins at python.org
Wed Jul 20 21:51:33 CEST 2011


http://hg.python.org/peps/rev/a6f02035c66c
changeset:   3904:a6f02035c66c
user:        pje
date:        Wed Jul 20 14:48:00 2011 -0400
summary:
  Fix backward-compatibility hole described by Jeff Hardy in:

  http://mail.python.org/pipermail/python-dev/2011-July/112370.html

Using the approach described here:

  http://mail.python.org/pipermail/python-dev/2011-July/112374.html

This should now restrict backward-compatibility concerns to tool-support
questions, unless somebody comes up with another way to break it.  ;-)

files:
  pep-0402.txt |  190 +++++++++++++++++++++++++++-----------
  1 files changed, 135 insertions(+), 55 deletions(-)


diff --git a/pep-0402.txt b/pep-0402.txt
--- a/pep-0402.txt
+++ b/pep-0402.txt
@@ -339,17 +339,57 @@
 checking for a ``__version__`` or some other attribute), *and* there
 is a directory of the same name as the sought-for package on
 ``sys.path`` somewhere, *and* the package is not actually installed,
-then such code could *perhaps* be fooled into thinking a package is
-installed that really isn't.
+then such code could be fooled into thinking a package is installed
+that really isn't.
 
-However, even in the rare case where all these conditions line up to
-happen at once, the failure is more likely to be annoying than
-damaging.  In most cases, after all, the code will simply fail a
-little later on, when it actually tries to DO something with the
-imported (but empty) module.  (And code that checks ``__version__``
-attributes or for the presence of some desired function, class, or
-module in the package will not see a false positive result in the
-first place.)
+For example, suppose someone writes a script (``datagen.py``)
+containing the following code::
+
+    try:
+        import json
+    except ImportError:
+        import simplejson as json
+
+And runs it in a directory laid out like this::
+
+    datagen.py
+    json/
+        foo.js
+        bar.js
+
+If ``import json`` succeeded due to the mere presence of the ``json/``
+subdirectory, the code would incorrectly believe that the ``json``
+module was available, and proceed to fail with an error.
+
+However, we can prevent corner cases like these from arising, simply
+by making one small change to the algorithm presented so far.  Instead
+of allowing you to import a "pure virtual" package (like ``zc``),
+we allow only importing of the *contents* of virtual packages.
+
+That is, a statement like ``import zc`` should raise ``ImportError``
+if there is no ``zc.py`` or ``zc/__init__.py`` on ``sys.path``.  But,
+doing ``import zc.buildout`` should still succeed, as long as there's
+a ``zc/buildout.py`` or ``zc/buildout/__init__.py`` on ``sys.path``.
+
+In other words, we don't allow pure virtual packages to be imported
+directly, only modules and self-contained packages.  (This is an
+acceptable limitation, because there is no *functional* value to
+importing such a package by itself.  After all, the module object
+will have no *contents* until you import at least one of its
+subpackages or submodules!)
+
+Once ``zc.buildout`` has been successfully imported, though, there
+*will* be a ``zc`` module in ``sys.modules``, and trying to import it
+will of course succeed.  We are only preventing an *initial* import
+from succeeding, in order to prevent false-positive import successes
+when clashing subdirectories are present on ``sys.path``.
+
+So, with this slight change, the ``datagen.py`` example above will
+work correctly.  When it does ``import json``, the mere presence of a
+``json/`` directory will simply not affect the import process at all,
+even if it contains ``.py`` files.  The ``json/`` directory will still
+only be searched in the case where an import like ``import
+json.converter`` is attempted.
 
 Meanwhile, tools that expect to locate packages and modules by
 walking a directory tree can be updated to use the existing
@@ -361,41 +401,54 @@
 Specification
 =============
 
-Two changes are made to the existing import process.
+A change is made to the existing import process, when importing
+names containing at least one ``.`` -- that is, imports of modules
+that have a parent package.
 
-First, the built-in ``__import__`` function must not raise an
-``ImportError`` when importing a submodule of a module with no
-``__path__``.  Instead, it must attempt to *create* a ``__path__``
-attribute for the parent module first, as described in `__path__
-creation`_, below.
+Specifically, if the parent package does not exist, or exists but
+lacks a ``__path__`` attribute, an attempt is first made to create a
+"virtual path" for the parent package (following the algorithm
+described in the section on `virtual paths`_, below).
 
-Second, if searching ``sys.meta_path`` and ``sys.path`` (or a parent
-package ``__path__``) fails to find a module being imported, the
-import process must attempt to create a ``__path__`` attribute for
-the missing module.  If the attempt succeeds, an empty module is
-created and its ``__path__`` is set.  Otherwise, importing fails.
+If the computed "virtual path" is empty, an ``ImportError`` results,
+just as it would today.  However, if a non-empty virtual path is
+obtained, the normal import of the submodule or subpackage proceeds,
+using that virtual path to find the submodule or subpackage.  (Just
+as it would have with the parent's ``__path__``, if the parent package
+had existed and had a ``__path__``.)
 
-In both of the above cases, if a non-empty ``__path__`` is created,
-the name of the module whose ``__path__`` was created is added to
-``sys.virtual_packages`` -- an initially-empty ``set()`` of package
-names.
+When a submodule or subpackage is found (but not yet loaded),
+the parent package is created and added to ``sys.modules`` (if it
+didn't exist before), and its ``__path__`` is set to the computed
+virtual path (if it wasn't already set).
 
-(This way, code that extends ``sys.path`` at runtime can find out
-what virtual packages are currently imported, and thereby add any
-new subdirectories to those packages' ``__path__`` attributes.  See
-`Standard Library Changes/Additions`_ below for more details.)
+In this way, when the actual loading of the submodule or subpackage
+occurs, it will see a parent package existing, and any relative
+imports will work correctly.  However, if no submodule or subpackage
+exists, then the parent package will *not* be created, nor will a
+standalone module be converted into a package (by the addition of a
+spurious ``__path__`` attribute).
 
-Conversely, if an empty ``__path__`` results, an ``ImportError``
-is immediately raised, and the module is not created or changed, nor
-is its name added to ``sys.virtual_packages``.
+Note, by the way, that this change must be applied *recursively*: that
+is, if ``foo`` and ``foo.bar`` are pure virtual packages, then
+``import foo.bar.baz`` must wait until ``foo.bar.baz`` is found before
+creating module objects for *both* ``foo`` and ``foo.bar``, and then
+create both of them together, properly setting the ``foo`` module's
+``.bar`` attrbute to point to the ``foo.bar``module.
 
+In this way, pure virtual packages are never directly importable:
+an ``import foo`` or ``import foo.bar`` by itself will fail, and the
+corresponding modules will not appear in ``sys.modules`` until they
+are needed to point to a *successfully* imported submodule or
+self-contained subpackage.
 
-``__path__`` Creation
----------------------
 
-A virtual ``__path__`` is created by obtaining a PEP 302 "importer"
-object for each of the path entries found in ``sys.path`` (for a
-top-level module) or the parent ``__path__`` (for a submodule).
+Virtual Paths
+-------------
+
+A virtual path is created by obtaining a PEP 302 "importer" object for
+each of the path entries found in ``sys.path`` (for a top-level
+module) or the parent ``__path__`` (for a submodule).
 
 (Note: because ``sys.meta_path`` importers are not associated with
 ``sys.path`` or ``__path__`` entry strings, such importers do *not*
@@ -403,18 +456,34 @@
 
 Each importer is checked for a ``get_subpath()`` method, and if
 present, the method is called with the full name of the module/package
-the ``__path__`` is being constructed for.  The return value is either
-a string representing a subdirectory for the requested package, or
+the path is being constructed for.  The return value is either a
+string representing a subdirectory for the requested package, or
 ``None`` if no such subdirectory exists.
 
-The strings returned by the importers are added to the ``__path__``
+The strings returned by the importers are added to the path list
 being built, in the same order as they are found.  (``None`` values
 and missing ``get_subpath()`` methods are simply skipped.)
 
-In Python code, the algorithm would look something like this::
+The resulting list (whether empty or not) is then stored in a
+``sys.virtual_package_paths`` dictionary, keyed by module name.
+
+This dictionary has two purposes.  First, it serves as a cache, in
+the event that more than one attempt is made to import a submodule
+of a virtual package.
+
+Second, and more importantly, the dictionary can be used by code that
+extends ``sys.path`` at runtime to *update* imported packages'
+``__path__`` attributes accordingly.  (See `Standard Library
+Changes/Additions`_ below for more details.)
+
+In Python code, the virtual path construction algorithm would look
+something like this::
 
     def get_virtual_path(modulename, parent_path=None):
 
+        if modulename in sys.virtual_package_paths:
+            return sys.virtual_package_paths[modulename]
+
         if parent_path is None:
             parent_path = sys.path
 
@@ -429,6 +498,7 @@
                 if subpath is not None:
                     path.append(subpath)
 
+        sys.virtual_package_paths[modulename] = path
         return path
 
 And a function like this one should be exposed in the standard
@@ -453,19 +523,25 @@
   path.
 
   The implementation of this function does a simple top-down traversal
-  of ``sys.virtual_packages``, and performs any necessary
-  ``get_subpath()`` calls to identify what path entries need to
-  be added to each package's ``__path__``, given that `path_entry`
+  of ``sys.virtual_package_paths``, and performs any necessary
+  ``get_subpath()`` calls to identify what path entries need to be
+  added to the virtual path for that package, given that `path_entry`
   has been added to ``sys.path``.  (Or, in the case of sub-packages,
-  adding a derived subpath entry, based on their parent namespace's
-  ``__path__``.)
+  adding a derived subpath entry, based on their parent package's
+  virtual path.)
+
+  (Note: this function must update both the path values in
+  ``sys.virtual_package_paths`` as well as the ``__path__`` attributes
+  of any corresponding modules in ``sys.modules``, even though in the
+  common case they will both be the same ``list`` object.)
 
 * A new ``iter_virtual_packages(parent='')`` function to allow
-  top-down traversal of virtual packages in ``sys.virtual_packages``,
-  by yielding the child virtual packages of `parent`.  For example,
-  calling ``iter_virtual_packages("zope")`` might yield ``zope.app``
-  and ``zope.products`` (if they are imported virtual packages listed
-  in ``sys.virtual_packages``), but **not** ``zope.foo.bar``.
+  top-down traversal of virtual packages from
+  ``sys.virtual_package_paths``, by yielding the child virtual
+  packages of `parent`.  For example, calling
+  ``iter_virtual_packages("zope")`` might yield ``zope.app``
+  and ``zope.products`` (if they are virtual packages listed in
+  ``sys.virtual_package_paths``), but **not** ``zope.foo.bar``.
   (This function is needed to implement ``extend_virtual_paths()``,
   but is also potentially useful for other code that needs to inspect
   imported virtual packages.)
@@ -500,10 +576,11 @@
   and do other things that make more sense for a self-contained
   project than for a mere "namespace" package.
 
-* ``sys.virtual_packages`` is allowed to contain non-existent or
-  not-yet-imported package names; code that uses its contents should
-  not assume that every name in this set is also present in
-  ``sys.modules`` or that importing the name will necessarily succeed.
+* ``sys.virtual_package_paths`` is allowed to contain entries for
+  non-existent or not-yet-imported package names; code that uses its
+  contents should not assume that every key in this dictionary is also
+  present in ``sys.modules`` or that importing the name will
+  necessarily succeed.
 
 * If you are changing a currently self-contained package into a
   virtual one, it's important to note that you can no longer use its
@@ -539,7 +616,9 @@
   XXX This might list a lot of not-really-packages.  Should we
   require importable contents to exist?  If so, how deep do we
   search, and how do we prevent e.g. link loops, or traversing onto
-  different filesystems, etc.?  Ick.
+  different filesystems, etc.?  Ick.  Also, if virtual packages are
+  listed, they still can't be *imported*, which is a problem for the
+  way that ``pkgutil.walk_modules()`` is currently implemented.
 
 * "Meta" importers (i.e., importers placed on ``sys.meta_path``) do
   not need to implement ``get_subpath()``, because the method

-- 
Repository URL: http://hg.python.org/peps


More information about the Python-checkins mailing list