[Import-SIG] One last try: "virtual packages"

P.J. Eby pje at telecommunity.com
Tue Jul 12 03:01:52 CEST 2011


Ok, so based on the last round of discussions about terminology, and 
how other languages process their path, I got to doing some thinking, 
and here is one last try at a high-performing, markerless, 
ultra-backwards-compatible, approach to this thing.  I call it, 
"virtual packages".

The implementation consists of two small *additions* to today's 
import semantics.  These additions don't affect the performance or 
behavior of "standard" imports (i.e., the ones we have today), but 
enable certain imports that would currently fail, to succeed instead.

Overall, the goal is to make package imports work more like a user 
coming over from languages like Perl, Java, and PHP would expect with 
respect to subpath searching, and creation/expansion of 
packages.  (For instance, this proposal does away with the need to 
move 'foo.py' to 'foo/__init__.py' when turning a module into a package.)

Anyway, this'll be my last attempt at coming up with a markerless 
approach, but I do hope you'll take the time to read it carefully, as 
it has very different semantics and performance impacts from my 
previous proposals, even though it may sound quite similar on the surface.

In particular, this proposal is the ONLY implementation ever proposed 
for this PEP that has *zero* filesystem-level performance overhead 
for normal imports.

That's right.  Zip.  Zero.  Nada.  None.  Nil.

The *only* cases where this proposal adds additional filesystem 
access overhead is in cases where, without this proposal, an 
ImportError would've happened under present-day import semantics.

So, read it and weep... or smile, or whatever.  ;-)


The First Addition - "Virtual" Packages
---------------------------------------

The first addition to existing import semantics is that if you try to 
import a submodule of a module with no __path__, then instead of 
treating it as a missing module, a __path__ is dynamically 
constructed, using namespace_subpath() calls on the parent path or sys.path.

If the resulting __path__ is empty, it's an import error.  Otherwise, 
the module's __path__ attribute is set, and the import goes ahead as 
if the module had been a package all along.

In other words, every module is a "virtual package".  If you treat it 
as a package, it'll become/act like one.  Otherwise, it's still a module.

This means that if, say, you have a bunch of directories named 
'random' on sys.path (without any __init__ modules in them), 
importing 'random' still imports the stdlib random.py.

However, if you try to import 'random.myspecialrandom', a __path__ 
will be constructed and used -- and if the submodule exists, it'll be 
imported.  (And if you later add a random/myspecialrandom/ directory 
somewhere on sys.path, you'll be able to import 
random.myspecialrandom.whatever out of it, by recursive application 
of this "virtual package" rule.)

Notice that this is very different from my previous attempt at a 
similar scheme.  First, it doesn't introduce any performance overhead 
on 'import random', as the extra lookups aren't done until and unless 
you try to 'import random.foo'...  which existing code of course will 
not be doing.

(Second, but also important, it doesn't distort the __path__ of 
packages with an __init__ module, because such packages are *not* 
virtual packages; they retain their present day semantics.)

Anyway, with this one addition, imports will now behave in a way 
that's friendly to users of e.g. Perl and Java, who expect the code 
for a module 'foo' to lie *outside* the foo/ directory, and for 
lookups of foo.bar or foo::bar to be searched for in foo/ 
subdirectories all along the respective equivalents of sys.path.

You now can simply ship a single zope.py to create a virtual "zope" 
package -- a namespace shared by code from multiple distributions.

But wait...  how does that fix the filename collision 
problem?  Aren't we still going to collide on the zope.py 
file?  Well, that's where the second addition comes in.


The Second Addition - "Pure Virtual" Packages
---------------------------------------------

The second addition is that, if an import fails to find a module 
entirely, then once again, a virtual __path__ is assembled using 
namespace_subpath() calls.  If the path is empty, the import 
fails.  But if it's non-empty, an empty module is created and its 
__path__ is set.

Voila...  we now have a "pure" virtual package.  (i.e. a package with 
no corresponding "defining" module).  So, if you have a bunch of 
__init__-free zope/ directories on sys.path, you can freely import from them.

But what happens if you DO have an __init__ module somewhere?  Well, 
because we haven't changed the normal import semantics, the first 
__init__ module ends up being a defining module, and by default, its 
__path__ is set in the normal way, just like today.  So, it's not a 
virtual package, it's a standard package.  If you must have a 
defining module, you'll have to move it from zope/__init__.py to zope.py.

(Either that, or use some sort of API call to explicitly request a 
virtual package __path__ to be set up.  But the recommended way to do 
it would be just to move the file up a level.)


Impact
------

This proposal doesn't affect performance of imports that don't ever 
*use* a virtual package __path__, because the setup is delayed until then.

It doesn't break installation tools: distutils and setuptools both 
handle this without blinking.  You just list your defining module (if 
you have one) in 'py_modules', along with any individual submodules, 
and you list the subpackages in 'packages'.

It doesn't break code-finding tools in any way that other 
implementation proposals don't.  (That is, ALL our proposals allow 
__init__ to go away, so tools are definitely going to require 
updating; all that differs between proposals is precisely what sort 
of updating is required.)

Really, about the only downside I can see is that this proposal can't 
be implemented purely via a PEP 302 meta-importer in Python 2.x.  The 
builtin __import__ function bails out when __path__ is missing on a 
parent, so it would actually require replacing __import__ in order to 
implement true virtual package support.

(For my own personal use case for this PEP in 2.x (i.e., replacing 
setuptools' current mechanism with a PEP-compliant one), it's not too 
big a deal, though, because I was still going to need explicit 
registration in .pth files: no matter the mechanism used, it isn't 
built into the interpreter, so it still has to be bootstrapped somehow!)

Anyway.  Thoughts?



More information about the Import-SIG mailing list