[Python-Dev] PEP 273: Import Modules from Zip Archives

Guido van Rossum guido@python.org
Mon, 29 Oct 2001 10:06:59 -0500


> Jim Ahlstrom wrote:
>  [From PEP 273]
> > Currently, sys.path is a list of directory names as strings. 

[GMcM]
> A nit: that's in practice, but not by definition (a couple std 
> modules have been patched to allow other things on 
> sys.path). After extensive use of imputil (which puts objects 
> on sys.path), I think we might as well make it official that 
> sys.path is a list of strings.

Interesting.  So you think imputil is wrong to put objects there?
Why?  (Not arguing, just interested in your experience.)

> [Subdirectory Equivalence]
> 
> This is a bit misleading - it can be read as implying that the 
> importable modules are in a flat namespace. They're not. To 
> get R.Q.modfoo imported, R.__init__ and R.Q.__init__ must 
> be imported. The __init__ modules have the opportunity to 
> play with __path__ to break the equivalence of R.Q.modfoo to 
> R/Q/modfoo.py. Or (more likely), play games with attributes 
> so that Q is, say, an instance, or maybe a module imported 
> from someplace else.
> 
> Question: if Archive.zip/Q/__init__.pyc does 
> "__path__.append('some/real/directory')", will it work?

It should.

> [dynamic libs]
> > It might be possible to extract the dynamic module from the
> > zip file, write it to a plain file and load it. 
> 
> I think you can nail the door shut on this one. On many OSes, 
> making  dynamic libs available to a process requires settings 
> that can only (sanely) be made before the process starts.

ANd I believe some systems require shared libraries to be owned by
root.

> OTOH, it's common practice these days to put dynamic libs 
> inside packages. That needs to be dealt with at runtime, and 
> at build time (since it breaks the expectation that you can just 
> "zip up" the package).

Yes.  This is important.

> [no *.py files]
> > You also can't import source files *.py from a zip archive. 
> 
> Apparently some Linuxes / RPM distributions don't deliver 
> .pyc's or .pyo's. Since they're installed as root and run as 
> some-poor-user, I'm afraid there are quite a few installations 
> running off .py's all the time. So while it's definitely sub-
> optimal, I'm not sure it should be outlawed.

Heh, this is an argument for Jim's position -- it would have come out
during testing that way, since the imports would fail. ;-)

> [Guido]
> > But it would still be good if the .py files were also in the
> > zip file, so the source can be used in tracebacks etc. 
> 
> As you probably remember me saying before, I think this is a 
> very minor nice-to-have. For one thing, you've probably done 
> enough testing before stuffing your code into an archive that 
> you're past the point of looking at a traceback and going 
> "doh!". Most of the time you'll need to look at the code 
> anyway.

Testing is for wimps. :-)

> OTOH, this [more from Guido]:
> > A C API to get a source line from a filename might be a good
> > idea (plus a Python API). 
> 
> points the way towards something I'm very much in favor of: 
> deferring things to that-which-did-the-importing.

Yup.

> [Efficiency]
> > The key is the archive name from sys.path joined with the
> > file name (including any subdirectories) within the archive. 
> 
> DIfferent spellings of the same path are possible in a 
> filesystem, but not in a dictionary. A bit of "harmless" 
> tweaking of sys.path could render an archive unreachable.

Hm, wouldn't the archive just be opened a second time?  Or do I
misundestand you?

> [zlib must be available at start]
> I'll agree, and agree with Guido that the coolest thing would be 
> to make zlib standard.

But we'd have to make sure it's statically linked.  (Fortunately, we
already link it statically on Windows.)

> [Booting - python.zip should be part of the generated sys.path]
> Agree. Nice and straightforward.
> 
> Now, from the discussion:
> 
> [restate Jim]
> sys.path contains /C/D/E/Archive.zip
> and Archive.zip contains "Q/R/modfoo.pyc"
> so "import Q.R.modfoo" is satisfied by
> /C/D/E/Archive.zip/Q/R/modfoo.pyc
> 
> [restate Finn]
> Jython has sys.path /C/D/E/Archive.zip!Lib
> and Archive.zip has "Lib/Q/R/modfoo.pyc"
> so "import Q.R.modfoo" is satisfied by
> /C/D/E/Archive.zip/Lib/Q/R/modfoo.pyc
> 
> [restate Guido]
> Why not use /C/D/E/Archive.zip/Lib on sys.path?
> 
> I use embedded archives. So sys.path will have an entry like:
> "/path/to/executable?84758" which says that the archive 
> starts at position 84758 in the file "executable". 
> 
> Anything beyond the simple approach Jim takes gets into 
> some very URL-ish territory. That's fine by me :-).
> 
> I don't really like the idea of hacking special knowledge of zip 
> files into import.c (which is already a specialist in 
> filesystems). Like Finn said, if this is a deployment issue (we 
> want zip files now, and are willing to live with strict limitations / 
> rules to get it), then OK (as long as it supports __path__ and 
> some way of dealing with dynamic libs in packages).
> 
> Personally, I think package support stretched import.c to it's 
> monolithic limits and it's high time the code was refactored to 
> make it sanely extensible.

Yes -- this has been on my TODO list for ages.  But who's gonna DO it?

--Guido van Rossum (home page: http://www.python.org/~guido/)