Packages & __path__ (was:Re: [Python-Dev] Python 2.3a1 release -- Dec 31)
Just van Rossum
just@letterror.com
Wed, 25 Dec 2002 22:42:45 +0100
Samuele Pedroni wrote:
> From: "Just van Rossum" <just@letterror.com>
>
> > But, as I wrote, it *can't* work with importers in general. It can
> > be made to work with specific importers, such as zipimporter, but I
> > don't see the point (I'll rant about that in a second ;-).
>
> but such importers would better go in meta_path.
Yes, but that doesn't make modifying __path__ less fragile ;-)
> Btw, reading PEP 302 it seems that when dealing with a package, first
> *all* importers in sys.meta_path called with
>
> fullname and __path__ of the package.
Correct.
> For symmetry (as long as we keep __path__) I think the import
> mechanism would have to look for a list in __meta__path__ (or maybe
> just a single value in __meta_importer__):
Hm, you may have a point there.
However, this makes life yet a bit harder for tools that only *find*
modules, and not load them (eg. modulefinder.py). This is already a
problem (even without the new hooks, but less so) with __path__:
currently a modulefinder (let's use that term for a generic non-loading
module finding tool) has to calculate what __path__ *would* be, were it
actually loaded, and pass that as the second argument to
imp.find_module(). For regular file system imports this is not such a
big deal, but for importers that have no notion of a file system, yet
use pkg.__path__, it is. The PEP proposes an optional method for loader
objects named get_path(), that would return what __path__ would be set
to were the package actually loaded. I'm not all that excited about the
neccessity for this... pkg.__meta_path__ would make things yet more
complicated (does it need to be passed to imp.find_module as well?).
What you call __meta_importer__ might fit into the scheme, perhaps just
as __importer__ (and rename what the PEP now defines as __importer__ to
__loader__, which seems a good idea anyway). I think iu.py does
something similar: it adds a __importsub__ symbol to packages, which is
a function to import submodules.
However, this all leads to a narrower search path for submodules. I want
it to be *wider*.
Back to __path__: about the only reason I can think for having it in the
first place (besides being the "I'm a package" flag) is optimization: it
avoids some stat calls by restricting the lookup to a specific
directory. Yet there's the desire to import submodules from other
places. Eg: loading .pyc modules from a zip archive, yet have
dynamically loaded extensions come from the file system. Extension
modules as submodules of frozen packages is currently not possible
without deeper trickery: __path__ is a string in that case, so modifying
it wouldn't be possible, let alone solve the problem.
Let's assume for a moment we want to deprecate __path__. (Hm, I tried --
in my mind <wink> -- to also get rid of the "I'm a package" flag, but
this is not realistic as long as we have implicit relative imports.)
Submodules would be found on sys.path by constructing a relative path
like "package/submodule.py".
Here's a sample directory structure:
dir1/spam/__init__.py
dir1/spam/ham.py
dir2/spam/eggs.pyd
dir1 and dir2 are items on sys.path. "import spam.eggs" would first load
"spam" from dir1/spam/__init__.py and then load "spam.eggs" from
dir2/spam/eggs.pyd. It appears as if this is more or less what
pkgutils.py tries to accomplish, yet brings the control of this feature
back to the domain of the application, where it belongs.
It also makes life easier for modulefinders: imp.find_module() could now
be made to accept fully qualified module names (only when no 'path'
argument is given). It would *also* allow me to get rid of the second
argument of the find_module() method of the importer protocol as well as
of the proposed imp.find_module2() function. It makes a lot of things
simpler...
Yet one more thing this scheme enables: extension modules as submodules
in a multiplatform library setup (ie. different platforms using the same
library on a server).
Finding and loading of submodules would be completely decoupled from the
parent module; parent and child could be imported by completely
unrelated importers. The latter is already a *feature* of sys.meta_path
(which explains why I'm not thrilled about Samuele's proposal ;-) and I
would love for this to be extended to file system imports.
Just