[Import-SIG] One last try: "virtual packages"
ericsnowcurrently at gmail.com
Tue Jul 12 04:19:46 CEST 2011
On Mon, Jul 11, 2011 at 7:01 PM, P.J. Eby <pje at telecommunity.com> wrote:
> Ok, so based on the last round of discussions about terminology, and how
> other languages process their path, I got to doing some thinking, and here
> is one last try at a high-performing, markerless,
> ultra-backwards-compatible, approach to this thing. I call it, "virtual
> The implementation consists of two small *additions* to today's import
> semantics. These additions don't affect the performance or behavior of
> "standard" imports (i.e., the ones we have today), but enable certain
> imports that would currently fail, to succeed instead.
> Overall, the goal is to make package imports work more like a user coming
> over from languages like Perl, Java, and PHP would expect with respect to
> subpath searching, and creation/expansion of packages. (For instance, this
> proposal does away with the need to move 'foo.py' to 'foo/__init__.py' when
> turning a module into a package.)
> Anyway, this'll be my last attempt at coming up with a markerless approach,
> but I do hope you'll take the time to read it carefully, as it has very
> different semantics and performance impacts from my previous proposals, even
> though it may sound quite similar on the surface.
> In particular, this proposal is the ONLY implementation ever proposed for
> this PEP that has *zero* filesystem-level performance overhead for normal
> That's right. Zip. Zero. Nada. None. Nil.
> The *only* cases where this proposal adds additional filesystem access
> overhead is in cases where, without this proposal, an ImportError would've
> happened under present-day import semantics.
> So, read it and weep... or smile, or whatever. ;-)
> The First Addition - "Virtual" Packages
> The first addition to existing import semantics is that if you try to import
> a submodule of a module with no __path__, then instead of treating it as a
> missing module, a __path__ is dynamically constructed, using
> namespace_subpath() calls on the parent path or sys.path.
> If the resulting __path__ is empty, it's an import error. Otherwise, the
> module's __path__ attribute is set, and the import goes ahead as if the
> module had been a package all along.
> In other words, every module is a "virtual package". If you treat it as a
> package, it'll become/act like one. Otherwise, it's still a module.
> This means that if, say, you have a bunch of directories named 'random' on
> sys.path (without any __init__ modules in them), importing 'random' still
> imports the stdlib random.py.
> However, if you try to import 'random.myspecialrandom', a __path__ will be
> constructed and used -- and if the submodule exists, it'll be imported.
> (And if you later add a random/myspecialrandom/ directory somewhere on
> sys.path, you'll be able to import random.myspecialrandom.whatever out of
> it, by recursive application of this "virtual package" rule.)
> Notice that this is very different from my previous attempt at a similar
> scheme. First, it doesn't introduce any performance overhead on 'import
> random', as the extra lookups aren't done until and unless you try to
> 'import random.foo'... which existing code of course will not be doing.
> (Second, but also important, it doesn't distort the __path__ of packages
> with an __init__ module, because such packages are *not* virtual packages;
> they retain their present day semantics.)
> Anyway, with this one addition, imports will now behave in a way that's
> friendly to users of e.g. Perl and Java, who expect the code for a module
> 'foo' to lie *outside* the foo/ directory, and for lookups of foo.bar or
> foo::bar to be searched for in foo/ subdirectories all along the respective
> equivalents of sys.path.
> You now can simply ship a single zope.py to create a virtual "zope" package
> -- a namespace shared by code from multiple distributions.
> But wait... how does that fix the filename collision problem? Aren't we
> still going to collide on the zope.py file? Well, that's where the second
> addition comes in.
> The Second Addition - "Pure Virtual" Packages
> The second addition is that, if an import fails to find a module entirely,
> then once again, a virtual __path__ is assembled using namespace_subpath()
> calls. If the path is empty, the import fails. But if it's non-empty, an
> empty module is created and its __path__ is set.
> Voila... we now have a "pure" virtual package. (i.e. a package with no
> corresponding "defining" module). So, if you have a bunch of __init__-free
> zope/ directories on sys.path, you can freely import from them.
> But what happens if you DO have an __init__ module somewhere? Well, because
> we haven't changed the normal import semantics, the first __init__ module
> ends up being a defining module, and by default, its __path__ is set in the
> normal way, just like today. So, it's not a virtual package, it's a
> standard package. If you must have a defining module, you'll have to move
> it from zope/__init__.py to zope.py.
> (Either that, or use some sort of API call to explicitly request a virtual
> package __path__ to be set up. But the recommended way to do it would be
> just to move the file up a level.)
> This proposal doesn't affect performance of imports that don't ever *use* a
> virtual package __path__, because the setup is delayed until then.
> It doesn't break installation tools: distutils and setuptools both handle
> this without blinking. You just list your defining module (if you have one)
> in 'py_modules', along with any individual submodules, and you list the
> subpackages in 'packages'.
> It doesn't break code-finding tools in any way that other implementation
> proposals don't. (That is, ALL our proposals allow __init__ to go away, so
> tools are definitely going to require updating; all that differs between
> proposals is precisely what sort of updating is required.)
> Really, about the only downside I can see is that this proposal can't be
> implemented purely via a PEP 302 meta-importer in Python 2.x. The builtin
> __import__ function bails out when __path__ is missing on a parent, so it
> would actually require replacing __import__ in order to implement true
> virtual package support.
I have been considering porting the 3.3 importlib to 2.x, for a
variety of reasons. If the implementation for "virtual namespace
package portions" is done there then this shouldn't be a big deal.
> (For my own personal use case for this PEP in 2.x (i.e., replacing
> setuptools' current mechanism with a PEP-compliant one), it's not too big a
> deal, though, because I was still going to need explicit registration in
> .pth files: no matter the mechanism used, it isn't built into the
> interpreter, so it still has to be bootstrapped somehow!)
> Anyway. Thoughts?
Cool idea. So for users the only difference is that suddenly foo.py
and a foo directory (without __init__.py) can coexist/cooperate, and
__init__.py becomes optional?
> Import-SIG mailing list
> Import-SIG at python.org
More information about the Import-SIG