
[James C. Ahlstrom]
The PEP for zip import is 273. Please take a look and comment.
Any files may be present in the zip archive, but only files *.pyc, *.pyo and __init__.py[co] are available for import. Zip import of *.py and dynamic modules (*.pyd, *.so) is disallowed.
Why can't .py be allowed? If a more recent .py[co] (or $py.class) exists, it is used. Otherwise the .py file is compiled and discarded when the process ends. Sure, it is slower, but a zip files with only .py[co] entries would be of little use with jython.
Just as sys.path currently has default directory names, default zip archive names are added too. Otherwise there is no way to import all Python library files from an archive.
A standard for this would be really cool.
Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip", and we are trying to import modfoo from the Q package. Then import.c will generate a list of paths and extensions and will look for the file. The list of generated paths does not change for zip imports. Suppose import.c generates the path "/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path "/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is exactly equivalent to finding "Q/R/modfoo.pyc" in the archive.
So python packages and modules can exists *only* at the top level? That would conflict somewhat with jython where at would be common to put python modules into the same .zip file as java classes and java classes also wants to own the root of a zip file. In the current implementaion in jython, we can put python modules under any path and add the zipfile!path to sys.path: sys.path.append("/path/to/zipfile.zip!Lib") which will look for Lib/Q/R/modfoo.py (and Lib/Q/R/modfoo$py.class) in the archive.
Efficiency The only way to find files in a zip archive is linear search. So for each zip file in sys.path, we search for its names once, and put the names plus other relevant data into a static Python dictionary. The key is the archive name from sys.path joined with the file name (including any subdirectories) within the archive. This is exactly the name generated by import.c, and makes lookup easy.
[I found efficiency hard to achieve in jython because opening a zipfile in java also cause the zip index to be read into a dictionary. So we did not want to reopen a zipfile if it can be avoided. Instead we hide a reference to the opened file in sys.path so that when removing a zipfile name from sys.path, the file is eventually closed.] Would entries in the static python dict be removed when a zipfile is removed from sys.path? What is the __path__ vrbl set to in a module imported from a zipfile? Can the module make changes to __path__ and will be changes to used when importing submodules? What value should __file__ have? regards, finn