[Python-Dev] Draft PEP: "Simplified Package Layout and Partitioning"

P.J. Eby pje at telecommunity.com
Wed Jul 20 22:44:15 CEST 2011

At 01:35 PM 7/20/2011 -0600, Eric Snow wrote:
>This is a really nice solution.  So a virtual package is not imported
>until a submodule of the virtual package is successfully imported


>(except for direct import of pure virtual packages).

Not correct.  ;-)  What we do is avoid creating a parent module or 
altering its __path__ until a submodule/subpackage import is just 
about to be successfully completed.

See the change I just pushed to the PEP:


Or read the revised Specification section here (which is a bit easier 
to read than the diff):


The change is basically that we wait until a successful find_module() 
happens before creating or tweaking any parent modules.  This way, 
the load_module() part will still see an initialized parent package 
in sys.modules, and if it does any relative imports, they'll still work.

(It *does* mean that if an error happens during load_module(), then 
future imports of the virtual package will succeed, but I'm okay with 
that corner case.)

>It seems like
>sys.virtual_packages should be populated even during a failed
>submodule import.  Is that right?

Yes.  In the actual draft, btw, I dubbed it 
``sys.virtual_package_paths`` and made it a dictionary.  This 
actually makes the pkgutil.extend_path() code more general: it'll be 
able to fix the paths of things you haven't actually imported yet.  ;-)

>Also, it makes sense that the above applies to all virtual packages,
>not just pure ones.

Well, if the package isn't "pure" then what you've imported is really 
just an ordinary module, not a package at all.  ;-)

>When a pure virtual package is directly imported, a new [empty] module
>is created and its __path__ is set to the matching value in
>sys.virtual_packages.  However, an "impure" virtual package is not
>created upon direct import, and its __path__ is not updated until a
>submodule import is attempted.  Even the sys.virtual_packages entry is
>not generated until the submodule attempt, since the virtual package
>mechanism doesn't kick in until the point that an ImportError is
>currently raised.
>This isn't that big a deal, but it would be the one behavioral
>difference between the two kinds of virtual packages.  So either leave
>that one difference, disallow direct import of pure virtual packages,
>or attempt to make virtual packages for all non-package imports.  That
>last one would impose the virtual package overhead on many more
>imports so it is probably too impractical.  I'm fine with leaving the
>one difference.

At this point, I've updated the PEP to disallow direct imports of 
pure virtual packages.  AFAICT it's the only approach that ensures 
you can't get false positive imports by having 
unrelated-but-similarly-named directories floating around.

So, really, there's not a difference, except that you can't import a 
useless empty module that you have no real business importing in the 
first place...  and I'm fine with that.  ;-)

>FYI, last night I started on an importlib-based implementation for the
>PEP and the above solution would be really easy to incorporate.

Well, you might want to double-check that now that I've updated the 
spec.  ;-)  In the new approach, you cannot rely on parent modules 
existing before proceeding to the submodule import.

However, I've just glanced at the importlib trunk, and I think I see 
what you mean.  It's already using a recursive approach, rather than 
an iterative one, so the change should be a lot simpler there than in import.c.

There probably just needs to be a pair of functions like:

     def _get_parent_path(parent):
         pmod = sys.modules.get(parent)
         if pmod is None:
                 pmod = _gcd_import(parent)
             except ImportError:
                 # Can't import parent, is it a virtual package?
                 path = imp.get_virtual_path(parent)
                 if not path:
                     # no, allow the parent's import error to propagate
                 return path
         if hasattr(pmod, '__path__'):
             return pmod.__path__
             return imp.get_virtual_path(parent)

     def _get_parent_module(parent):
         pmod = sys.modules.get(parent)
         if pmod is None:
             pmod = sys.modules[parent] = imp.new_module(parent)
             if '.' in parent:
                 head, _, tail = parent.rpartition('.')
                 setattr(_get_parent_module(head), tail, pmod)
         if not hasattr(pmod, '__path__'):
             pmod.__path__ = imp.get_virtual_path(parent)
         return pmod

And then instead of hanging on to parent_module during the import 
process, you'd just grab a path from _get_parent_path(), and 
initialize parent_module a little later, i.e.:

         if parent:
             path = _get_parent_path(parent)
             if not path:
                 msg = (_ERR_MSG + '; {} is not a 
package').format(name, parent)
                 raise ImportError(msg)

         meta_path = sys.meta_path + _IMPLICIT_META_PATH
         for finder in meta_path:
             loader = finder.find_module(name, path)
             if loader is not None:
                 # ensure parent module exists and is a package before loading
                 parent_module = _get_parent_module(parent)
             raise ImportError(_ERR_MSG.format(name))

So, yeah, actually, that's looking pretty sweet.  Basically, we just 
have to throw a virtual_package_paths dict into the sys module, and 
do the above along with the get_virtual_path() function and add 
get_subpath() to the importer objects, in order to get the PEP's core 
functionality working.

More information about the Python-Dev mailing list