[Patches] [ python-Patches-652586 ] New import hooks + Import from Zip files

noreply@sourceforge.net noreply@sourceforge.net
Fri, 13 Dec 2002 11:29:57 -0800


Patches item #652586, was opened at 2002-12-12 11:34
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=652586&group_id=5470

Category: Core (C code)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Just van Rossum (jvr)
Assigned to: Nobody/Anonymous (nobody)
Summary: New import hooks + Import from Zip files

Initial Comment:
This patch implements two things:
- a new set of import hooks, modelled after iu.py
- builtin support for imports from Zip archives (a
competing implementation for PEP 273)

The new set of hooks probably need a better document
explaining them (perhaps a PEP). My motivations have
been posted to python-dev.

Here's a brief description.

Three new objects are added to the sys module:
- path_hooks
- path_importer_cache
- meta_path

sys.path_hooks is a list of callable objects that take
a string as their only argument. A hook will be called
with a sys.path or pkg.__path__ item. It should return
an "importer" object (see below), or raise ImportError
or return None if it can't deal with the path item. By
default, sys.path_hooks only contains the zipimporter
type, if the zipimport module is available.

sys.path_importer_cache is a dict that caches the
results of sys.path_hooks to avoid repeated hook lookups.

sys.meta_path is a list of importer objects that are
invoked *before* the builtin import mechanism kicks in.
This allows overriding of builtin module and frozen
module import, but the main feature is that it allows
importer objects *without* a corresponding sys.path
item (just like builtin and frozen modules).

Importer objects must conform to the following protocol:

i.find_module(fullname) -> None or an importer object
i.load_module(fullname) -> the imported module (or
raise ImportError)

The 'fullname' is always the fully qualified module
name, ie. a dotted name for a submodule.

This patch adds one more feature: a sys.path item may
*itself* be an importer object. This is convenient for
experimentation, but using it may break third-party
code that assumes sys.path contains only strings.

----------------------------------------------------------------------

>Comment By: Just van Rossum (jvr)
Date: 2002-12-13 20:29

Message:
Logged In: YES 
user_id=92689

I've attached a new version that fixes a problem that Thomas
Heller noticed: package imports failed when the zip archive
didn't contain a separate entry for the package directory.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2002-12-13 18:52

Message:
Logged In: YES 
user_id=6380

Thanks -- I'm reviewing this now (if not too many
distracting email arrives :-).

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-12-13 00:19

Message:
Logged In: YES 
user_id=92689

Yet another new version:
- zipimport is a builtin module now; includes patches for
PC\config.c and PBbuild\pythoncore.dsp, contributed and
tested by Paul Moore.
- tweaked doc strings in zipimport.
I'll quit for today...

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-12-12 22:59

Message:
Logged In: YES 
user_id=92689

I've uploaded a slightly updated version: the potential zlib
recursion has been fixed (I was in fact able to trigger it);
I've added a test for it, although it's tricky to do. Also
cleaned up a few exception messages in zipimport.

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-12-12 20:34

Message:
Logged In: YES 
user_id=92689

I've uploaded a fresh version, with GC support added. Tested
to the extent that it doesn't crash ;-)

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-12-12 17:34

Message:
Logged In: YES 
user_id=92689

I didn't know, thanks! I actually did find -N but it "didn't
work" ;-)

I've attached a new patch (slightly updated as well) as one
file. Also included is Lib/test/output/test_zipimport.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-12-12 15:28

Message:
Logged In: YES 
user_id=33168

Just, in case you didn't know, you can do a cvs diff -N to
include new/removed files in a patch.  I think you need to
do a cvs add/remove before -N works though.

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-12-12 15:05

Message:
Logged In: YES 
user_id=92689

I've attached test_zipimport.py, a simple test script, to be
placed in Lib/test/.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-12-12 14:16

Message:
Logged In: YES 
user_id=21627

recursive imports: having foo.zip as the first thing in sys.path, 
and having a compressed zlib.py in there was indeed the 
case I was thinking of (without actually trying).

GC: It is absolutely necessary. If this is not done now, 
somebody will spend days of research five years from now, 
because somebody else thought that invoking .update on this 
files attribute was a good idea. This could be reduced to C-
level documentation if the dictionary was not exposed to 
Python.

builtin: I think it ought to be builtin. It's a small module, it 
does not rely on additional libraries, it is not maintained 
externally, and it reduces bootstrap dependencies to have it 
builtin. OTOH, zlib can't be builtin, as it relies on an 
additional library which may not always be present.

get_long: you are right; the 0xff does make the values 
positive, again. I somehow thought the result might still be 
negative.



----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-12-12 13:39

Message:
Logged In: YES 
user_id=92689

Hm, regarding gc again: zipimporter objects can only
*theoretically* be involved in cycles, and only if people
muck with the "files" attribute. As it is, the "files" dict
only contains strings (keys) and tuples (values) which
contain strings and ints. So is it really neccesary?

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-12-12 13:35

Message:
Logged In: YES 
user_id=92689

Martin:
- Have you actually seen a recursive import of zlib? I don't
see how it's possible to cause a stackoverflow. Hm, maybe if
someone stuffs a zlib.py in a zip archive? I'll add a guard.
- Are you really saying it *should* be a builtin module, or
was that comment supposed to start with "if"? See also
Paul's remark about zlib: if zlib remains a shared lib,
having zipimport as a builtin only helps for non-compressed
archives.
- gc: I'll give this a go (never done it before!)
- get_long() -> it's called with a signed char *, and the
buf[i] & 0xff should take care of the rest, so I'm not sure
I understand. But trust your judgement and changed it in my
working copy.

Thank you very much for the quick reply!

----------------------------------------------------------------------

Comment By: Paul Moore (pmoore)
Date: 2002-12-12 12:23

Message:
Logged In: YES 
user_id=113328

I can provide details on how to patch the Windows build 
process - it's not hard. The question of whether to have 
zipimport as builtin or dynamic should be resolved first (the 
processes are different in the 2 cases).

It should be pointed out that if zipimport is builtin, it still relies 
on a dynamic module (zlib) for importing from compressed 
zips. I don't know whether this affects the decision, though...

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-12-12 12:15

Message:
Logged In: YES 
user_id=21627

Some comments:
- zipimport should normally be built as a builtin module, there
  should also be a patch for the Windows build procedure
- zipimporters need to support garbage collection, as they 
can occur in cycles
- there should be a mechanism to prevent a stack overflow in 
case of recursive import of zlib.
- get_long needs to accept an unsigned char*

----------------------------------------------------------------------

Comment By: Just van Rossum (jvr)
Date: 2002-12-12 11:37

Message:
Logged In: YES 
user_id=92689

btw. the attached file contains a patch for various files as
well as a new file: zipimporter.c. Place the latter in the
Modules/ directory.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=652586&group_id=5470