[Import-SIG] One last try: "virtual packages"
pje at telecommunity.com
Tue Jul 12 03:01:52 CEST 2011
Ok, so based on the last round of discussions about terminology, and
how other languages process their path, I got to doing some thinking,
and here is one last try at a high-performing, markerless,
ultra-backwards-compatible, approach to this thing. I call it,
The implementation consists of two small *additions* to today's
import semantics. These additions don't affect the performance or
behavior of "standard" imports (i.e., the ones we have today), but
enable certain imports that would currently fail, to succeed instead.
Overall, the goal is to make package imports work more like a user
coming over from languages like Perl, Java, and PHP would expect with
respect to subpath searching, and creation/expansion of
packages. (For instance, this proposal does away with the need to
move 'foo.py' to 'foo/__init__.py' when turning a module into a package.)
Anyway, this'll be my last attempt at coming up with a markerless
approach, but I do hope you'll take the time to read it carefully, as
it has very different semantics and performance impacts from my
previous proposals, even though it may sound quite similar on the surface.
In particular, this proposal is the ONLY implementation ever proposed
for this PEP that has *zero* filesystem-level performance overhead
for normal imports.
That's right. Zip. Zero. Nada. None. Nil.
The *only* cases where this proposal adds additional filesystem
access overhead is in cases where, without this proposal, an
ImportError would've happened under present-day import semantics.
So, read it and weep... or smile, or whatever. ;-)
The First Addition - "Virtual" Packages
The first addition to existing import semantics is that if you try to
import a submodule of a module with no __path__, then instead of
treating it as a missing module, a __path__ is dynamically
constructed, using namespace_subpath() calls on the parent path or sys.path.
If the resulting __path__ is empty, it's an import error. Otherwise,
the module's __path__ attribute is set, and the import goes ahead as
if the module had been a package all along.
In other words, every module is a "virtual package". If you treat it
as a package, it'll become/act like one. Otherwise, it's still a module.
This means that if, say, you have a bunch of directories named
'random' on sys.path (without any __init__ modules in them),
importing 'random' still imports the stdlib random.py.
However, if you try to import 'random.myspecialrandom', a __path__
will be constructed and used -- and if the submodule exists, it'll be
imported. (And if you later add a random/myspecialrandom/ directory
somewhere on sys.path, you'll be able to import
random.myspecialrandom.whatever out of it, by recursive application
of this "virtual package" rule.)
Notice that this is very different from my previous attempt at a
similar scheme. First, it doesn't introduce any performance overhead
on 'import random', as the extra lookups aren't done until and unless
you try to 'import random.foo'... which existing code of course will
not be doing.
(Second, but also important, it doesn't distort the __path__ of
packages with an __init__ module, because such packages are *not*
virtual packages; they retain their present day semantics.)
Anyway, with this one addition, imports will now behave in a way
that's friendly to users of e.g. Perl and Java, who expect the code
for a module 'foo' to lie *outside* the foo/ directory, and for
lookups of foo.bar or foo::bar to be searched for in foo/
subdirectories all along the respective equivalents of sys.path.
You now can simply ship a single zope.py to create a virtual "zope"
package -- a namespace shared by code from multiple distributions.
But wait... how does that fix the filename collision
problem? Aren't we still going to collide on the zope.py
file? Well, that's where the second addition comes in.
The Second Addition - "Pure Virtual" Packages
The second addition is that, if an import fails to find a module
entirely, then once again, a virtual __path__ is assembled using
namespace_subpath() calls. If the path is empty, the import
fails. But if it's non-empty, an empty module is created and its
__path__ is set.
Voila... we now have a "pure" virtual package. (i.e. a package with
no corresponding "defining" module). So, if you have a bunch of
__init__-free zope/ directories on sys.path, you can freely import from them.
But what happens if you DO have an __init__ module somewhere? Well,
because we haven't changed the normal import semantics, the first
__init__ module ends up being a defining module, and by default, its
__path__ is set in the normal way, just like today. So, it's not a
virtual package, it's a standard package. If you must have a
defining module, you'll have to move it from zope/__init__.py to zope.py.
(Either that, or use some sort of API call to explicitly request a
virtual package __path__ to be set up. But the recommended way to do
it would be just to move the file up a level.)
This proposal doesn't affect performance of imports that don't ever
*use* a virtual package __path__, because the setup is delayed until then.
It doesn't break installation tools: distutils and setuptools both
handle this without blinking. You just list your defining module (if
you have one) in 'py_modules', along with any individual submodules,
and you list the subpackages in 'packages'.
It doesn't break code-finding tools in any way that other
implementation proposals don't. (That is, ALL our proposals allow
__init__ to go away, so tools are definitely going to require
updating; all that differs between proposals is precisely what sort
of updating is required.)
Really, about the only downside I can see is that this proposal can't
be implemented purely via a PEP 302 meta-importer in Python 2.x. The
builtin __import__ function bails out when __path__ is missing on a
parent, so it would actually require replacing __import__ in order to
implement true virtual package support.
(For my own personal use case for this PEP in 2.x (i.e., replacing
setuptools' current mechanism with a PEP-compliant one), it's not too
big a deal, though, because I was still going to need explicit
registration in .pth files: no matter the mechanism used, it isn't
built into the interpreter, so it still has to be bootstrapped somehow!)
More information about the Import-SIG