[Python-Dev] New and Improved Import Hooks

Moore, Paul Paul.Moore@atosorigin.com
Tue, 3 Dec 2002 15:47:42 -0000


From: Just van Rossum [mailto:just@letterror.com]

> Neal Norwitz wrote:
>
> > On Tue, Dec 03, 2002 at 11:48:29AM -0000, Moore, Paul wrote:
> >
> > > My view is that import.c is, indeed, very complex and
> > > difficult code to follow. I agree with Just's comments that
> > > a cleanup in import.c would be useful.
> >
> > I agree with both of these statements. I have reviewed the
> > patch. While it's quite involved, I don't think it's worth
> > rejecting I would really like to see a cleanup, though
>
> I'm working hard on a zipimport module that uses
> sys.import_hooks. It's going to be SO much cleaner than the
> existing zip import patch.

That's great news, as I've said before. Can I ask that you
consider a few issues, though. I don't have particular use cases
in mind, but I believe that PEP 273 had a number of implicit
assumptions, which your proposal does not incorporate. I'd like
to make the differences explicit, just so that people get a
chance to comment on whether they care.

1. Your import hooks seem to act at a level "above" sys.path,
   in the sense that the normal sys.path search becomes just
   another hook. With that in mind, I don't see a way for a user
   to add a zipfile in the *middle* of sys.path.
2. The syntax needed to add a zipfile search is different from
   the familiar one for sys.path additions. This isn't an issue
   as such, but may make the feature a little less accessible.
3. By using a non-sys.path approach, you make it impossible for
   users to set PYTHONPATH to include zipfiles.

Can I again ask you to review Gordon McMillan's import manager.
I really feel that you're reinventing the wheel here, and Gordon's
code is well tested, as it's part of his Installer package.

Maybe Gordon's code (which is pure Python) isn't appropriate to
use, but I'm sure the design is worth looking at.

> > Rejecting the patch would be worthwhile if the interface
> > would need to change after incorporating this patch. I don't
> > believe this is the case. Whether this patch is accepted
> > or not wouldn't change future patches that Just wants to
> > incorporate.
>
> I personally reject the patch because it modifies import.c
> beyond repair. Once this stuff is in, there's NO way we can go
> back without a huge effort, meaning it won't happen.

I think you're exaggerating. It makes things worse, certainly,
but I'd strongly dispute your claim of "beyond repair". I simply
don't think the change is that major. Either it's already beyond
repair, or the patched version is still fixable.

And nobody is proposing "going back". A refactoring is "going
forward". Whether it's a refactoring of code that *already*
supports zipfiles or a refactoring which then has to be changed
to add zipfile support shouldn't matter.

> It adds
> a whole directory listing caching scheme that is a) totally
> unneeded and b) has (IMO) _nothing_ to do with zip importing
> per se. That ExternalNames/External names stuff? I don't
> understand why it is needed just to enable zip imports. It's a
> lot of stuff!

I've explained this a couple of times now. Please read the PEP
for details, but Jim claims that zipfile imports need this if
they are not to be too slow. A directory lookup in a zipfile is
basically a linear search through the list of all files in the
zipfile. For a large zipfile (such as the Python library) this
is going to be *slow*.

Please benchmark your implementation on a zipped version of the
full Python library, replacing the directory version, and compare
with Jim's benchmarks. While speed isn't the be-all and end-all
of this patch, I'd hate to lose a significant speed gain...

And the cache has a nice side-effect in that it speeds up general
imports, particularly on networked drives.

> > > I think that the zipfile patch is close to being
> > > acceptable. I don't have a problem if it gets rejected, but
> > > can I make a plea - if it does, can it be "Rejected because
> > > patch XXX does the job better" rather than "Rejected
> > > because this can be implemented better using feature YYY".
> >
> > Agreed.
>
> And that's ecxactly what I intend to say in a couple of days if
> not sooner.

Great! Will your code address all of the above points, and the
issues raised on python-dev in Oct/Nov 2001 (see below for
references)?

> [ ... ]
> > If Just incorporates his import patch, perhaps the zipfile
> > code could be refactored to use the more generic framework. I
> > don't view these two patches as mutually exclusive.
>
> I *totally* do.
>
> > If all of these issues are addressed, the resulting code
> > could be clearer, even if intermediate versions are not.
>
> A rewrite now is MUCH easier than cleaning it up later. It's
> going to be much less code. Wait and see.

Here are some references that anyone looking at this stuff should
read. There was a *lot* of discussion on this patch in Oct/Nov 2001.

Start of the PEP 273 thread:
http://mail.python.org/pipermail/python-dev/2001-October/018158.html

Thread on caching of lookups
http://mail.python.org/pipermail/python-dev/2001-November/018346.html

Thomas Heller proposing Gordon McMillan's code for the standard
library:
http://mail.python.org/pipermail/python-dev/2001-October/017750.html

As I've said, I'm not against rejecting the current patch - but I'd
be concerned if all of the work that went into it (ie, Jim's design
and discussions on python-dev, *not* the specific code implementing
it) was just lost.

Paul.