[Patches] [ python-Patches-652586 ] New import hooks + Import from Zip files
noreply@sourceforge.net
noreply@sourceforge.net
Sat, 14 Dec 2002 13:52:43 -0800
Patches item #652586, was opened at 2002-12-12 10:34
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=652586&group_id=5470
Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Just van Rossum (jvr)
Assigned to: Nobody/Anonymous (nobody)
Summary: New import hooks + Import from Zip files
Initial Comment:
This patch implements two things:
- a new set of import hooks, modelled after iu.py
- builtin support for imports from Zip archives (a
competing implementation for PEP 273)
The new set of hooks probably need a better document
explaining them (perhaps a PEP). My motivations have
been posted to python-dev.
Here's a brief description.
Three new objects are added to the sys module:
- path_hooks
- path_importer_cache
- meta_path
sys.path_hooks is a list of callable objects that take
a string as their only argument. A hook will be called
with a sys.path or pkg.__path__ item. It should return
an "importer" object (see below), or raise ImportError
or return None if it can't deal with the path item. By
default, sys.path_hooks only contains the zipimporter
type, if the zipimport module is available.
sys.path_importer_cache is a dict that caches the
results of sys.path_hooks to avoid repeated hook lookups.
sys.meta_path is a list of importer objects that are
invoked *before* the builtin import mechanism kicks in.
This allows overriding of builtin module and frozen
module import, but the main feature is that it allows
importer objects *without* a corresponding sys.path
item (just like builtin and frozen modules).
Importer objects must conform to the following protocol:
i.find_module(fullname) -> None or an importer object
i.load_module(fullname) -> the imported module (or
raise ImportError)
The 'fullname' is always the fully qualified module
name, ie. a dotted name for a submodule.
This patch adds one more feature: a sys.path item may
*itself* be an importer object. This is convenient for
experimentation, but using it may break third-party
code that assumes sys.path contains only strings.
----------------------------------------------------------------------
Comment By: Paul Moore (pmoore)
Date: 2002-12-14 21:52
Message:
Logged In: YES
user_id=113328
Thanks for the explanation of find_module. That sounds very
useful.
On a separate note, why do you expose
sys.path_importer_cache? I can't imagine any reason why
the user would ever touch it manually (other than for
interactive experimentation).
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-14 20:46
Message:
Logged In: YES
user_id=92689
Btw. it would be nicer if I indeed provided an "importer"
wrapper for the builtin mechanism and simply placed it on
and invoke it through sys.meta_path. The reason I haven't
done that yet is that A) this would change import.c even
more (although it would be better factored then) and B) it
would have _some_ impact on the performance of the builtin
import mechanism, as *all* imports would then pass through
yet another Python call. But I hope to play with this a bit
later next week. It could also be a project for 2.4.
Note that this is exactly what Gordon does in iu.py, except
he goes even further: he has separate importer objects on
the metapath for builtin modules, frozen modules and file
system imports. I think that's fantastic, but it would be
more controversial as it might affect performance and might
be harder to integrate with the current imp module.
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-14 20:17
Message:
Logged In: YES
user_id=92689
Paul, this is the reason why the find_module() method
returns an object (and I guess now is a good time to explain
this <wink>; it also answers a question from Guido he asked
in private mail). zipimporter returns self, but you don't
have to do that: you can return anything with a
load_module() method. So all state you gather during
find_module() you can simply pack in a different object and
return that. Something like this
class MyHook:
...
def find_module(self, fullname):
info = self._find_module(fullname)
if info is None:
return None
return Loader(info)
class Loader:
def __init__(self, info):
self.info = info
def load_module(self, fullname):
...load module using self.info...
A good showcase for this pattern would be a wrapper for the
builtin import mechanism: it could stuff the (file,
filename, (suffix, mode, type)) tuple that imp.find_module()
returns in a loader object, whose load_module() method would
simply call imp.load_module() with that stuff. (I think such
a wrapper is needed in import.c if we want to properly
expose the new hooks in the imp module.)
You can see the importer protocol as a further abstraction
(and OO-ification) of the current
imp.find_module()/imp.load_module() calls.
----------------------------------------------------------------------
Comment By: Paul Moore (pmoore)
Date: 2002-12-14 19:39
Message:
Logged In: YES
user_id=113328
I have some concerns about a design issue.
I've been playing about writing my own hooks in Python, and I
have found that for most of my tests, the find_module method
is simply not useful. I understand that it can be cheaper
(sometimes a lot cheaper) to just check for the existence of a
module than actually loading up the code, etc, etc. So in
those cases, find_module() offers a shortcut if the module
isn't found.
But if the module *is* found, and find_module has to do
significant work, then all this work has to be repeated at
load_module time. There's no mechanism for the result of
find_module (say, a pointer to the module in a zip directory,
or whatever) to be passed to load_module. So load_module
has no alternative than to redo the work (short of the hook
implementing some form of internal cache mechanism, which
could be quite messy to do).
I don't know what the likely uses of hooks will be. So I've no
idea if find_module will turn out in practice to be a cheap
operation. But I can certainly imagine cases (the
hypthetical "load a module from a URL" hook) where
find_module could be far from cheap.
Is it possible to let find_module be optional? If the hook
implements it, let it be used as a quick check for existence. If
the hook doesn't implement it, just go for it - call load_module
and be prepared for a "can't find it" response.
This probably means that more extensive changes to import.c
will be needed. But once we expose the design, we're going
to be stuck with backward compatibility issues forever, so it's
important we get it right *now*, rather than later. (Although
having find_module mandatory initially, but made optional
later, is probably less problematic than most other forms of
design change...)
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-13 22:44
Message:
Logged In: YES
user_id=92689
The previous version requires it and failed if they weren't,
the present version still requires it, but missing entries
are added in the read_directory() function (around line
542). So it's guaranteed they're there if needed.
I could have solved it differently at import time, but I
thought it be better to do slightly more work at
zipdirectory-read time so we have to do less work later. I
doubt it matters much in reality, this was just the simplest
way to solve the problem...
(I'll update the test to reflect the solution of this problem.)
----------------------------------------------------------------------
Comment By: Paul Moore (pmoore)
Date: 2002-12-13 21:20
Message:
Logged In: YES
user_id=113328
I'd say it's pretty much guaranteed *not* to be a good idea. I
am fairly certain that at least some Windows zip utilities fail
to have a separate zip entry for the directory itself.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-12-13 20:02
Message:
Logged In: YES
user_id=6380
Just: it looks like your latest version *requires* that the
entries for package directories are present. Is that really
a good idea?
----------------------------------------------------------------------
Comment By: James C. Ahlstrom (ahlstromjc)
Date: 2002-12-13 19:55
Message:
Logged In: YES
user_id=64929
The PEP 273 addition of the standard zip archive name
/usr/local/lib/python23.zip (or Windows name) to sys.path to
support import of the Python standard lib (including
site.py) from a zip archive is missing. That was the point
of my patch to site.py, getpath.c, getpathp.c, main.c,
pythonrun.c, and sysmodule.c. I believe those patches can
be used with this import.c if you want the feature.
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-13 19:29
Message:
Logged In: YES
user_id=92689
I've attached a new version that fixes a problem that Thomas
Heller noticed: package imports failed when the zip archive
didn't contain a separate entry for the package directory.
----------------------------------------------------------------------
Comment By: Guido van Rossum (gvanrossum)
Date: 2002-12-13 17:52
Message:
Logged In: YES
user_id=6380
Thanks -- I'm reviewing this now (if not too many
distracting email arrives :-).
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-12 23:19
Message:
Logged In: YES
user_id=92689
Yet another new version:
- zipimport is a builtin module now; includes patches for
PC\config.c and PBbuild\pythoncore.dsp, contributed and
tested by Paul Moore.
- tweaked doc strings in zipimport.
I'll quit for today...
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-12 21:59
Message:
Logged In: YES
user_id=92689
I've uploaded a slightly updated version: the potential zlib
recursion has been fixed (I was in fact able to trigger it);
I've added a test for it, although it's tricky to do. Also
cleaned up a few exception messages in zipimport.
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-12 19:34
Message:
Logged In: YES
user_id=92689
I've uploaded a fresh version, with GC support added. Tested
to the extent that it doesn't crash ;-)
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-12 16:34
Message:
Logged In: YES
user_id=92689
I didn't know, thanks! I actually did find -N but it "didn't
work" ;-)
I've attached a new patch (slightly updated as well) as one
file. Also included is Lib/test/output/test_zipimport.
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2002-12-12 14:28
Message:
Logged In: YES
user_id=33168
Just, in case you didn't know, you can do a cvs diff -N to
include new/removed files in a patch. I think you need to
do a cvs add/remove before -N works though.
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-12 14:05
Message:
Logged In: YES
user_id=92689
I've attached test_zipimport.py, a simple test script, to be
placed in Lib/test/.
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-12-12 13:16
Message:
Logged In: YES
user_id=21627
recursive imports: having foo.zip as the first thing in sys.path,
and having a compressed zlib.py in there was indeed the
case I was thinking of (without actually trying).
GC: It is absolutely necessary. If this is not done now,
somebody will spend days of research five years from now,
because somebody else thought that invoking .update on this
files attribute was a good idea. This could be reduced to C-
level documentation if the dictionary was not exposed to
Python.
builtin: I think it ought to be builtin. It's a small module, it
does not rely on additional libraries, it is not maintained
externally, and it reduces bootstrap dependencies to have it
builtin. OTOH, zlib can't be builtin, as it relies on an
additional library which may not always be present.
get_long: you are right; the 0xff does make the values
positive, again. I somehow thought the result might still be
negative.
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-12 12:39
Message:
Logged In: YES
user_id=92689
Hm, regarding gc again: zipimporter objects can only
*theoretically* be involved in cycles, and only if people
muck with the "files" attribute. As it is, the "files" dict
only contains strings (keys) and tuples (values) which
contain strings and ints. So is it really neccesary?
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-12 12:35
Message:
Logged In: YES
user_id=92689
Martin:
- Have you actually seen a recursive import of zlib? I don't
see how it's possible to cause a stackoverflow. Hm, maybe if
someone stuffs a zlib.py in a zip archive? I'll add a guard.
- Are you really saying it *should* be a builtin module, or
was that comment supposed to start with "if"? See also
Paul's remark about zlib: if zlib remains a shared lib,
having zipimport as a builtin only helps for non-compressed
archives.
- gc: I'll give this a go (never done it before!)
- get_long() -> it's called with a signed char *, and the
buf[i] & 0xff should take care of the rest, so I'm not sure
I understand. But trust your judgement and changed it in my
working copy.
Thank you very much for the quick reply!
----------------------------------------------------------------------
Comment By: Paul Moore (pmoore)
Date: 2002-12-12 11:23
Message:
Logged In: YES
user_id=113328
I can provide details on how to patch the Windows build
process - it's not hard. The question of whether to have
zipimport as builtin or dynamic should be resolved first (the
processes are different in the 2 cases).
It should be pointed out that if zipimport is builtin, it still relies
on a dynamic module (zlib) for importing from compressed
zips. I don't know whether this affects the decision, though...
----------------------------------------------------------------------
Comment By: Martin v. Löwis (loewis)
Date: 2002-12-12 11:15
Message:
Logged In: YES
user_id=21627
Some comments:
- zipimport should normally be built as a builtin module, there
should also be a patch for the Windows build procedure
- zipimporters need to support garbage collection, as they
can occur in cycles
- there should be a mechanism to prevent a stack overflow in
case of recursive import of zlib.
- get_long needs to accept an unsigned char*
----------------------------------------------------------------------
Comment By: Just van Rossum (jvr)
Date: 2002-12-12 10:37
Message:
Logged In: YES
user_id=92689
btw. the attached file contains a patch for various files as
well as a new file: zipimporter.c. Place the latter in the
Modules/ directory.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=652586&group_id=5470