RE: [Python-Dev] New and Improved Import Hooks
From: Just van Rossum [mailto:just@letterror.com]
Neal Norwitz wrote:
On Tue, Dec 03, 2002 at 11:48:29AM -0000, Moore, Paul wrote:
My view is that import.c is, indeed, very complex and difficult code to follow. I agree with Just's comments that a cleanup in import.c would be useful.
I agree with both of these statements. I have reviewed the patch. While it's quite involved, I don't think it's worth rejecting I would really like to see a cleanup, though
I'm working hard on a zipimport module that uses sys.import_hooks. It's going to be SO much cleaner than the existing zip import patch.
That's great news, as I've said before. Can I ask that you consider a few issues, though. I don't have particular use cases in mind, but I believe that PEP 273 had a number of implicit assumptions, which your proposal does not incorporate. I'd like to make the differences explicit, just so that people get a chance to comment on whether they care. 1. Your import hooks seem to act at a level "above" sys.path, in the sense that the normal sys.path search becomes just another hook. With that in mind, I don't see a way for a user to add a zipfile in the *middle* of sys.path. 2. The syntax needed to add a zipfile search is different from the familiar one for sys.path additions. This isn't an issue as such, but may make the feature a little less accessible. 3. By using a non-sys.path approach, you make it impossible for users to set PYTHONPATH to include zipfiles. Can I again ask you to review Gordon McMillan's import manager. I really feel that you're reinventing the wheel here, and Gordon's code is well tested, as it's part of his Installer package. Maybe Gordon's code (which is pure Python) isn't appropriate to use, but I'm sure the design is worth looking at.
Rejecting the patch would be worthwhile if the interface would need to change after incorporating this patch. I don't believe this is the case. Whether this patch is accepted or not wouldn't change future patches that Just wants to incorporate.
I personally reject the patch because it modifies import.c beyond repair. Once this stuff is in, there's NO way we can go back without a huge effort, meaning it won't happen.
I think you're exaggerating. It makes things worse, certainly, but I'd strongly dispute your claim of "beyond repair". I simply don't think the change is that major. Either it's already beyond repair, or the patched version is still fixable. And nobody is proposing "going back". A refactoring is "going forward". Whether it's a refactoring of code that *already* supports zipfiles or a refactoring which then has to be changed to add zipfile support shouldn't matter.
It adds a whole directory listing caching scheme that is a) totally unneeded and b) has (IMO) _nothing_ to do with zip importing per se. That ExternalNames/External names stuff? I don't understand why it is needed just to enable zip imports. It's a lot of stuff!
I've explained this a couple of times now. Please read the PEP for details, but Jim claims that zipfile imports need this if they are not to be too slow. A directory lookup in a zipfile is basically a linear search through the list of all files in the zipfile. For a large zipfile (such as the Python library) this is going to be *slow*. Please benchmark your implementation on a zipped version of the full Python library, replacing the directory version, and compare with Jim's benchmarks. While speed isn't the be-all and end-all of this patch, I'd hate to lose a significant speed gain... And the cache has a nice side-effect in that it speeds up general imports, particularly on networked drives.
I think that the zipfile patch is close to being acceptable. I don't have a problem if it gets rejected, but can I make a plea - if it does, can it be "Rejected because patch XXX does the job better" rather than "Rejected because this can be implemented better using feature YYY".
Agreed.
And that's ecxactly what I intend to say in a couple of days if not sooner.
Great! Will your code address all of the above points, and the issues raised on python-dev in Oct/Nov 2001 (see below for references)?
[ ... ]
If Just incorporates his import patch, perhaps the zipfile code could be refactored to use the more generic framework. I don't view these two patches as mutually exclusive.
I *totally* do.
If all of these issues are addressed, the resulting code could be clearer, even if intermediate versions are not.
A rewrite now is MUCH easier than cleaning it up later. It's going to be much less code. Wait and see.
Here are some references that anyone looking at this stuff should read. There was a *lot* of discussion on this patch in Oct/Nov 2001. Start of the PEP 273 thread: http://mail.python.org/pipermail/python-dev/2001-October/018158.html Thread on caching of lookups http://mail.python.org/pipermail/python-dev/2001-November/018346.html Thomas Heller proposing Gordon McMillan's code for the standard library: http://mail.python.org/pipermail/python-dev/2001-October/017750.html As I've said, I'm not against rejecting the current patch - but I'd be concerned if all of the work that went into it (ie, Jim's design and discussions on python-dev, *not* the specific code implementing it) was just lost. Paul.
From: "Moore, Paul" <Paul.Moore@atosorigin.com>
That's great news, as I've said before. Can I ask that you consider a few issues, though. I don't have particular use cases in mind, but I believe that PEP 273 had a number of implicit assumptions, which your proposal does not incorporate. I'd like to make the differences explicit, just so that people get a chance to comment on whether they care.
1. Your import hooks seem to act at a level "above" sys.path, in the sense that the normal sys.path search becomes just another hook. With that in mind, I don't see a way for a user to add a zipfile in the *middle* of sys.path.
It is worth to notice that Jython 2.1 already implement more or less PEP 273 approach, allowing to add zipfiles to sys.path. We can live with a different/new approach and then plan a migration path with warning etc, but if that could be avoided it would be nice. regards.
1. Your import hooks seem to act at a level "above" sys.path, in the sense that the normal sys.path search becomes just another hook. With that in mind, I don't see a way for a user to add a zipfile in the *middle* of sys.path.
It is worth to notice that Jython 2.1 already implement more or less PEP 273 approach, allowing to add zipfiles to sys.path.
I'd be disappointed if adding zipfiles to sys.path wasn't possible. After all that's what PEP 273 says. --Guido van Rossum (home page: http://www.python.org/~guido/)
From: "Guido van Rossum" <guido@python.org>
1. Your import hooks seem to act at a level "above" sys.path, in the sense that the normal sys.path search becomes just another hook. With that in mind, I don't see a way for a user to add a zipfile in the *middle* of sys.path.
It is worth to notice that Jython 2.1 already implement more or less PEP 273 approach, allowing to add zipfiles to sys.path.
I'd be disappointed if adding zipfiles to sys.path wasn't possible. After all that's what PEP 273 says.
but - if I understand correctly - given the approach taken by the import_hooks patch there's the issue: consider all directories then all zipfiles versus consider both kind of import sources according to the order in sys.path, which is the most natural extension. regards.
[Guido]
I'd be disappointed if adding zipfiles to sys.path wasn't possible. After all that's what PEP 273 says.
[Samuele Pedroni]
but - if I understand correctly - given the approach taken by the import_hooks patch there's the issue:
consider all directories then all zipfiles
(or vice versa ;-)
versus consider both kind of import sources according to the order in sys.path, which is the most natural extension.
Right. I admit I dismissed sys.path a wee bit too quickly when proposing an alternative zipimport mechanism. Regarding sys.import_hooks, I see three possibilities: #1 for p in sys.path: for hook in sys.import_hooks: hook(p, ...) #2 for hook in sys.import_hooks: for p in sys.path: hook(p, ...) #3 for hook in sys.import_hooks: # the hook can traversing sys.path if needed. hook(...) My current sys.import_hooks patch implements #3 and I think that's the most logical thing to do; after all sys.path may not be meaningful for a hook. #2 makes little sense I guess. #1 could be an option, but, given the current state of import.c, would be a lot harder to implement. I also think that #3 is the most efficient scheme, as the hook can save some local state while traversing sys.path. I will change my zipimport hook (which _still_ isn't finished ;-) to also traverse sys.path, and check for *.zip path endings. Just
From: "Just van Rossum" <just@letterror.com>
I will change my zipimport hook (which _still_ isn't finished ;-) to also traverse sys.path, and check for *.zip path endings.
my point was that, given dir1;zip0;dir2 dir1 should take over zip0 that should take over dir2. [This is what Jython 2.1 does btw] If the resulting import precedence order is that both dir1 and dir2 take over zip0, then maybe allowing zip0 in sys.path is just confusing, and zipfiles should be specified elsewhere. In general I clearly see that there can be import hooks that can leverage sys.path and its order semantics and some that can't. regards.
Samuele Pedroni wrote:
From: "Just van Rossum" <just@letterror.com>
I will change my zipimport hook (which _still_ isn't finished ;-) to also traverse sys.path, and check for *.zip path endings.
my point was that, given
dir1;zip0;dir2
dir1 should take over zip0 that should take over dir2.
[This is what Jython 2.1 does btw]
If the resulting import precedence order is that both dir1 and dir2 take over zip0, then maybe allowing zip0 in sys.path is just confusing, and zipfiles should be specified elsewhere.
True, true. Does Jython implement the (old) idea that items on sys.path can be import handler *objects*? (If yes, can you briefly describe the interface of these objects?) After seeing what you mean I think that may be the only sensible solution. Say: for p in sys.path: if hasattr(p, "find_module"): x = p.find_module(...) elif not isinstance(p, basestring): continue else: x = builtin_find_module(...) ...load module... Just
From: "Just van Rossum" <just@letterror.com>
Samuele Pedroni wrote:
From: "Just van Rossum" <just@letterror.com>
I will change my zipimport hook (which _still_ isn't finished ;-) to also traverse sys.path, and check for *.zip path endings.
my point was that, given
dir1;zip0;dir2
dir1 should take over zip0 that should take over dir2.
[This is what Jython 2.1 does btw]
If the resulting import precedence order is that both dir1 and dir2 take over zip0, then maybe allowing zip0 in sys.path is just confusing, and zipfiles should be specified elsewhere.
True, true.
Does Jython implement the (old) idea that items on sys.path can be import handler *objects*? (If yes, can you briefly describe the interface of these objects?)
No jython accepts strings of the form *.zip in sys.path, that are converted upon scanning on import to string subclasses' instances with the same value as string, through which import happens using an internal interface implemented by these string subclasses. We also set __path__s to contain things of the form *.zip/rel/path/to/package. regards.
Samuele Pedroni wrote:
No jython accepts strings of the form *.zip in sys.path, that are converted upon scanning on import to string subclasses' instances with the same value as string, through which import happens using an internal interface implemented by these string subclasses.
We also set __path__s to contain things of the form *.zip/rel/path/to/package.
And these package __path__s are _also_ "import hook" string subclasses, right? That makes an awful lot of sense. Just
From: "Just van Rossum" <just@letterror.com>
Samuele Pedroni wrote:
No jython accepts strings of the form *.zip in sys.path, that are converted upon scanning on import to string subclasses' instances with the same value as string, through which import happens using an internal interface implemented by these string subclasses.
We also set __path__s to contain things of the form *.zip/rel/path/to/package.
And these package __path__s are _also_ "import hook" string subclasses, right?
Yes The interface is internal because here was no CPython prior art, and because we also support importing java classes from sys.path (both from dirs and zipfiles, although there are some bugs wrt to the latter), and our import code also begs for some refactoring... so things are in flux. [In the future (2.2) we were planning to add support for Java ClassLoaders too, but it is still unclear whether to put them in sys.path makes sense or not. I'm reporting this not because Java ClassLoaders are relevant to CPython but to point out that probably not all import hooks can be sys.path centered, and then a question is whether they should be considered before or after sys.path ] regards.
Just van Rossum <just@letterror.com> writes:
Does Jython implement the (old) idea that items on sys.path can be import handler *objects*? (If yes, can you briefly describe the interface of these objects?)
After seeing what you mean I think that may be the only sensible solution.
I think one requirement is that you can put zipfiles in PYTHONPATH and have that work. Regards, Martin
On Tuesday, Dec 3, 2002, at 21:05 Europe/Amsterdam, Just van Rossum wrote:
Regarding sys.import_hooks, I see three possibilities:
#1 for p in sys.path: for hook in sys.import_hooks: hook(p, ...)
#2 for hook in sys.import_hooks: for p in sys.path: hook(p, ...)
#3 for hook in sys.import_hooks: # the hook can traversing sys.path if needed. hook(...)
Note that there is "prior art" for case 1: the PYC resource importer for MacPython, which also looks at files (in stead of directories) on sys.path. It does have a bit of hairy code so it skips sys.path entries that don't apply to it (directories, basically). This was a major speedup, so maybe this should be generalized to something like for p in sys.path: for hook in sys.import_hooks: if not hook_can_handle.has_key((hook, p)): hook_can_handle[(hook, p)] = hook(p, None) # None requests "compatibility test" if hook_can_handle[(hook, p)]: hook(p, ....) -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
Jack Jansen wrote:
Regarding sys.import_hooks, I see three possibilities:
#1 for p in sys.path: for hook in sys.import_hooks: hook(p, ...)
#2 for hook in sys.import_hooks: for p in sys.path: hook(p, ...)
#3 for hook in sys.import_hooks: # the hook can traversing sys.path if needed. hook(...)
what's wrong with: #4 for path in sys.path: if isinstance(path, types.StringTypes): do_standard_import(...) else: path.do_import(...) after all, everyone knows how the path works, and everyone knows how to manipulate sys.path inside a Python program. (for PIL and other plug-in architectures, a "path.get_list_of_modules" method would also be nice). </F>
On Wednesday, Dec 4, 2002, at 11:33 Europe/Amsterdam, Fredrik Lundh wrote:
what's wrong with:
#4 for path in sys.path: if isinstance(path, types.StringTypes): do_standard_import(...) else: path.do_import(...)
After reading Just's later message and the explanation about how Jython does it I think that my preference is now something like the "if hasattr(path, 'find_module')" fix. This will allow you to use a subtype of strings on sys.path. Make one pass over sys.path replacing the zipfiles and other funnies by a specialised subtype of string. -- - Jack Jansen <Jack.Jansen@oratrix.com> http://www.cwi.nl/~jack - - If I can't dance I don't want to be part of your revolution -- Emma Goldman -
Fredrik Lundh wrote:
what's wrong with:
#4 for path in sys.path: if isinstance(path, types.StringTypes): do_standard_import(...) else: path.do_import(...)
That, or what Jack said, which I think is even better: #5 for path in sys.path: if hasattr(path, "find_module"): loader = path.find_module(name) if isinstance(path, types.StringTypes): do_standard_import(...) else: pass (I originally thought it might be important that path elements be str subclasses, but I'm not so sure anymore. Especially after I wrote a str subclass in C: it's a HUGE pain ;-) Just
Moore, Paul wrote:
That's great news, as I've said before. Can I ask that you consider a few issues, though. I don't have particular use cases in mind, but I believe that PEP 273 had a number of implicit assumptions, which your proposal does not incorporate. I'd like to make the differences explicit, just so that people get a chance to comment on whether they care.
1. Your import hooks seem to act at a level "above" sys.path, in the sense that the normal sys.path search becomes just another hook. With that in mind, I don't see a way for a user to add a zipfile in the *middle* of sys.path.
True. There would be ways around that, but it's currently not easy. I don't feel this is a particularly important feature, though.
2. The syntax needed to add a zipfile search is different from the familiar one for sys.path additions. This isn't an issue as such, but may make the feature a little less accessible.
The way I see it: sys.path is for file system imports. It's a list of directories, nothing else.
3. By using a non-sys.path approach, you make it impossible for users to set PYTHONPATH to include zipfiles.
This could be fixed by site.py: it could remove any .zip files from sys.path and add hooks for them in sys.import_hooks. Or it could be fixed in the C code that adds the script's directory to sys.path. Or we could add a tiny change to import.c that invokes the zipimport module if a zip file is encountered. [ snip ]
It adds a whole directory listing caching scheme that is a) totally unneeded and b) has (IMO) _nothing_ to do with zip importing per se. That ExternalNames/External names stuff? I don't understand why it is needed just to enable zip imports. It's a lot of stuff!
I've explained this a couple of times now. Please read the PEP for details, but Jim claims that zipfile imports need this if they are not to be too slow.
Well, I am not going to need it: upon reading the zip contents table I build a dict, mapping (full) module names to toc info. One dict lookup will be enough to get at the right data. Let's discuss the rest when my implementation actually works. Just
participants (7)
-
Fredrik Lundh
-
Guido van Rossum
-
Jack Jansen
-
Just van Rossum
-
martin@v.loewis.de
-
Moore, Paul
-
Samuele Pedroni