[Python-Dev] Draft PEP: "Simplified Package Layout and Partitioning"
P.J. Eby
pje at telecommunity.com
Wed Jul 20 22:44:15 CEST 2011
At 01:35 PM 7/20/2011 -0600, Eric Snow wrote:
>This is a really nice solution. So a virtual package is not imported
>until a submodule of the virtual package is successfully imported
Correct...
>(except for direct import of pure virtual packages).
Not correct. ;-) What we do is avoid creating a parent module or
altering its __path__ until a submodule/subpackage import is just
about to be successfully completed.
See the change I just pushed to the PEP:
http://hg.python.org/peps/rev/a6f02035c66c
Or read the revised Specification section here (which is a bit easier
to read than the diff):
http://www.python.org/dev/peps/pep-0402/#specification
The change is basically that we wait until a successful find_module()
happens before creating or tweaking any parent modules. This way,
the load_module() part will still see an initialized parent package
in sys.modules, and if it does any relative imports, they'll still work.
(It *does* mean that if an error happens during load_module(), then
future imports of the virtual package will succeed, but I'm okay with
that corner case.)
>It seems like
>sys.virtual_packages should be populated even during a failed
>submodule import. Is that right?
Yes. In the actual draft, btw, I dubbed it
``sys.virtual_package_paths`` and made it a dictionary. This
actually makes the pkgutil.extend_path() code more general: it'll be
able to fix the paths of things you haven't actually imported yet. ;-)
>Also, it makes sense that the above applies to all virtual packages,
>not just pure ones.
Well, if the package isn't "pure" then what you've imported is really
just an ordinary module, not a package at all. ;-)
>When a pure virtual package is directly imported, a new [empty] module
>is created and its __path__ is set to the matching value in
>sys.virtual_packages. However, an "impure" virtual package is not
>created upon direct import, and its __path__ is not updated until a
>submodule import is attempted. Even the sys.virtual_packages entry is
>not generated until the submodule attempt, since the virtual package
>mechanism doesn't kick in until the point that an ImportError is
>currently raised.
>
>This isn't that big a deal, but it would be the one behavioral
>difference between the two kinds of virtual packages. So either leave
>that one difference, disallow direct import of pure virtual packages,
>or attempt to make virtual packages for all non-package imports. That
>last one would impose the virtual package overhead on many more
>imports so it is probably too impractical. I'm fine with leaving the
>one difference.
At this point, I've updated the PEP to disallow direct imports of
pure virtual packages. AFAICT it's the only approach that ensures
you can't get false positive imports by having
unrelated-but-similarly-named directories floating around.
So, really, there's not a difference, except that you can't import a
useless empty module that you have no real business importing in the
first place... and I'm fine with that. ;-)
>FYI, last night I started on an importlib-based implementation for the
>PEP and the above solution would be really easy to incorporate.
Well, you might want to double-check that now that I've updated the
spec. ;-) In the new approach, you cannot rely on parent modules
existing before proceeding to the submodule import.
However, I've just glanced at the importlib trunk, and I think I see
what you mean. It's already using a recursive approach, rather than
an iterative one, so the change should be a lot simpler there than in import.c.
There probably just needs to be a pair of functions like:
def _get_parent_path(parent):
pmod = sys.modules.get(parent)
if pmod is None:
try:
pmod = _gcd_import(parent)
except ImportError:
# Can't import parent, is it a virtual package?
path = imp.get_virtual_path(parent)
if not path:
# no, allow the parent's import error to propagate
raise
return path
if hasattr(pmod, '__path__'):
return pmod.__path__
else:
return imp.get_virtual_path(parent)
def _get_parent_module(parent):
pmod = sys.modules.get(parent)
if pmod is None:
pmod = sys.modules[parent] = imp.new_module(parent)
if '.' in parent:
head, _, tail = parent.rpartition('.')
setattr(_get_parent_module(head), tail, pmod)
if not hasattr(pmod, '__path__'):
pmod.__path__ = imp.get_virtual_path(parent)
return pmod
And then instead of hanging on to parent_module during the import
process, you'd just grab a path from _get_parent_path(), and
initialize parent_module a little later, i.e.:
if parent:
path = _get_parent_path(parent)
if not path:
msg = (_ERR_MSG + '; {} is not a
package').format(name, parent)
raise ImportError(msg)
meta_path = sys.meta_path + _IMPLICIT_META_PATH
for finder in meta_path:
loader = finder.find_module(name, path)
if loader is not None:
# ensure parent module exists and is a package before loading
parent_module = _get_parent_module(parent)
loader.load_module(name)
break
else:
raise ImportError(_ERR_MSG.format(name))
So, yeah, actually, that's looking pretty sweet. Basically, we just
have to throw a virtual_package_paths dict into the sys module, and
do the above along with the get_virtual_path() function and add
get_subpath() to the importer objects, in order to get the PEP's core
functionality working.
More information about the Python-Dev
mailing list