[Python-Dev] PEP 273: Import Modules from Zip Archives

Gordon McMillan gmcm@hypernet.com
Mon, 29 Oct 2001 09:30:30 -0500


Jim Ahlstrom wrote:
 [From PEP 273]
> Currently, sys.path is a list of directory names as strings. 

A nit: that's in practice, but not by definition (a couple std 
modules have been patched to allow other things on 
sys.path). After extensive use of imputil (which puts objects 
on sys.path), I think we might as well make it official that 
sys.path is a list of strings.

[Subdirectory Equivalence]

This is a bit misleading - it can be read as implying that the 
importable modules are in a flat namespace. They're not. To 
get R.Q.modfoo imported, R.__init__ and R.Q.__init__ must 
be imported. The __init__ modules have the opportunity to 
play with __path__ to break the equivalence of R.Q.modfoo to 
R/Q/modfoo.py. Or (more likely), play games with attributes 
so that Q is, say, an instance, or maybe a module imported 
from someplace else.

Question: if Archive.zip/Q/__init__.pyc does 
"__path__.append('some/real/directory')", will it work?

[dynamic libs]
> It might be possible to extract the dynamic module from the
> zip file, write it to a plain file and load it. 

I think you can nail the door shut on this one. On many OSes, 
making  dynamic libs available to a process requires settings 
that can only (sanely) be made before the process starts.

OTOH, it's common practice these days to put dynamic libs 
inside packages. That needs to be dealt with at runtime, and 
at build time (since it breaks the expectation that you can just 
"zip up" the package).

[no *.py files]
> You also can't import source files *.py from a zip archive. 

Apparently some Linuxes / RPM distributions don't deliver 
.pyc's or .pyo's. Since they're installed as root and run as 
some-poor-user, I'm afraid there are quite a few installations 
running off .py's all the time. So while it's definitely sub-
optimal, I'm not sure it should be outlawed.

[Guido]
> But it would still be good if the .py files were also in the
> zip file, so the source can be used in tracebacks etc. 

As you probably remember me saying before, I think this is a 
very minor nice-to-have. For one thing, you've probably done 
enough testing before stuffing your code into an archive that 
you're past the point of looking at a traceback and going 
"doh!". Most of the time you'll need to look at the code 
anyway.

OTOH, this [more from Guido]:
> A C API to get a source line from a filename might be a good
> idea (plus a Python API). 

points the way towards something I'm very much in favor of: 
deferring things to that-which-did-the-importing.

[Efficiency]
> The key is the archive name from sys.path joined with the
> file name (including any subdirectories) within the archive. 

DIfferent spellings of the same path are possible in a 
filesystem, but not in a dictionary. A bit of "harmless" 
tweaking of sys.path could render an archive unreachable.

[zlib must be available at start]
I'll agree, and agree with Guido that the coolest thing would be 
to make zlib standard.

[Booting - python.zip should be part of the generated sys.path]
Agree. Nice and straightforward.

Now, from the discussion:

[restate Jim]
sys.path contains /C/D/E/Archive.zip
and Archive.zip contains "Q/R/modfoo.pyc"
so "import Q.R.modfoo" is satisfied by
/C/D/E/Archive.zip/Q/R/modfoo.pyc

[restate Finn]
Jython has sys.path /C/D/E/Archive.zip!Lib
and Archive.zip has "Lib/Q/R/modfoo.pyc"
so "import Q.R.modfoo" is satisfied by
/C/D/E/Archive.zip/Lib/Q/R/modfoo.pyc

[restate Guido]
Why not use /C/D/E/Archive.zip/Lib on sys.path?

I use embedded archives. So sys.path will have an entry like:
"/path/to/executable?84758" which says that the archive 
starts at position 84758 in the file "executable". 

Anything beyond the simple approach Jim takes gets into 
some very URL-ish territory. That's fine by me :-).

I don't really like the idea of hacking special knowledge of zip 
files into import.c (which is already a specialist in 
filesystems). Like Finn said, if this is a deployment issue (we 
want zip files now, and are willing to live with strict limitations / 
rules to get it), then OK (as long as it supports __path__ and 
some way of dealing with dynamic libs in packages).

Personally, I think package support stretched import.c to it's 
monolithic limits and it's high time the code was refactored to 
make it sanely extensible.



- Gordon