[Python-Dev] PEP 273: Import Modules from Zip Archives
Guido van Rossum
guido@python.org
Fri, 26 Oct 2001 16:34:15 -0400
> The PEP for zip import is 273. Please take a look and comment.
>
> http://python.sourceforge.net/peps/pep-0273.html
OK, I'll shoot. But I expect Gordon McMillan and Greg Stein to
provide more useful feedback.
| Currently, sys.path is a list of directory names as strings. If
| this PEP is implemented, an item of sys.path can be a string
| naming a zip file archive. The zip archive can contain a
| subdirectory structure to support package imports. The zip
| archive satisfies imports exactly as a subdirectory would.
I like this.
| The implementation is in C code in the Python core and works on
| all supported Python platforms.
This is good too, as it provides a bootstrap. OTOH I also would like
to see a prototype in Python, using either ihooks or imputil.
| Any files may be present in the zip archive, but only files *.pyc,
| *.pyo and __init__.py[co] are available for import. Zip import of
| *.py and dynamic modules (*.pyd, *.so) is disallowed.
|
| Just as sys.path currently has default directory names, default
| zip archive names are added too. Otherwise there is no way to
| import all Python library files from an archive.
More bootstrap goodness.
| Reading compressed zip archives requires the zlib module. An
| import of zlib will be attempted prior to any other imports. If
| zlib is not available at that time, only uncompressed archives
| will be readable, even if zlib subsequently becomes available.
Hm, I wonder if we couldn't just link with the libz.a C library and
use the C interface, if you're implementing this in C anyway.
| Subdirectory Equivalence
|
| The zip archive must be treated exactly as a subdirectory tree so
| we can support package imports based on current and future rules.
| Zip archive files must be created with relative path names. That
| is, archive file names are of the form: file1, file2, dir1/file3,
| dir2/dir3/file4.
|
| Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip",
| and we are trying to import modfoo from the Q package. Then
| import.c will generate a list of paths and extensions and will
| look for the file. The list of generated paths does not change
| for zip imports.
(Very clever.)
Suppose import.c generates the path
| "/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path
| "/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is
| exactly equivalent to finding "Q/R/modfoo.pyc" in the archive.
Nice.
| Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then
| your zip file will satisfy imports just as your subdirectory did.
|
| Well, not quite. You can't satisfy dynamic modules from a zip
| file. Dynamic modules have extensions like .dll, .pyd, and .so.
| They are operating system dependent, and probably can't be loaded
| except from a file. It might be possible to extract the dynamic
| module from the zip file, write it to a plain file and load it.
| But that would mean creating temporary files, and dealing with all
| the dynload_*.c, and that's probably not a good idea.
Agreed.
| You also can't import source files *.py from a zip archive. The
| problem here is what to do with the compiled files. Python would
| normally write these to the same directory as *.py, but surely we
| don't want to write to the zip file. We could write to the
| directory of the zip archive, but that would clutter it up, not
| good if it is /usr/bin for example. We could just fail to write
| the compiled files, but that makes zip imports very slow, and the
| user would probably not figure out what is wrong. It is probably
| best for users to put *.pyc into zip archives in the first place,
| and this PEP enforces that rule.
I agree. But it would still be good if the .py files were also in the
zip file, so the source can be used in tracebacks etc. A C API to get
a source line from a filename might be a good idea (plus a Python API).
| So the only imports zip archives support are *.pyc and *.pyo, plus
| the import of __init__.py[co] for packages, and the search of the
| subdirectory structure for the same.
I wonder if we need to make an additional rule that allows a .pyc file
to satisfy a module request even if we're in optimized mode (where
normally only .pyo files are searched). Otherwise, if someone ships a
zipfile with only .pyc files, their modules can't be imported at all
when python -O is used.
| Efficiency
|
| The only way to find files in a zip archive is linear search.
But there's an index record at the end that provides quick access.
So
| for each zip file in sys.path, we search for its names once, and
| put the names plus other relevant data into a static Python
| dictionary. The key is the archive name from sys.path joined with
| the file name (including any subdirectories) within the archive.
| This is exactly the name generated by import.c, and makes lookup
| easy.
We could do this kind of pre-scanning for regular dictionaries on
sys.path too. I found out very long ago (around '93 or '94) that this
saves a *lot* of startup time; I presume it still does. (And even
more if the info can be cached in a file.) The only problem is how to
detect when the cache becomes out of date. Of course, you could say
"if you want faster startup time, put all your files in a zip
archive", and I couldn't really argue with that. :-)
| zlib
|
| Compressed zip archives require zlib for decompression. Prior to
| any other imports, we attempt an import of zlib, and set a flag if
| it is available. All compressed files are invisible unless this
| flag is true.
Do we get an "module not found" error or something better, like
"compressed module found as <filename> but zlib unavailable"?
| It could happen that zlib was available later. For example, the
| import of site.py might add the correct directory to sys.path so a
| dynamic load succeeds. But compressed files will still be
| invisible. It is unknown if it can happen that importing site.py
| can cause zlib to appear, so maybe we're worrying about nothing.
| On Windows and Linux, the early import of zlib succeeds without
| site.py.
Yes, site.py isn't needed to make standard library modules available;
it's intended to make non-standare library modules available. :-)
| The problem here is the confusion caused by the reverse. Either a
| zip file satisfies imports or it doesn't. It is silly to say that
| site.py needs to be uncompressed, and that maybe imports will
| succeed later. If you don't like this, create uncompressed zip
| archives or make sure zlib is available, for example, as a
| built-in module. Or we can write special search logic during zip
| initialization.
I don't think we need anything special here. site.py shouldn't be
needed.
| Booting
|
| Python imports site.py itself, and this imports os, nt, ntpath,
| stat, and UserDict. It also imports sitecustomize.py which may
| import more modules. Zip imports must be available before site.py
| is imported.
|
| Just as there are default directories in sys.path, there must be
| one or more default zip archives too.
|
| The problem is what the name should be. The name should be linked
| with the Python version, so the Python executable can correctly
| find its corresponding libraries even when there are multiple
| Python versions on the same machine.
|
| This PEP suggests a zip archive name equal to the Python
| interpreter path with extension ".zip" (eg, /usr/bin/python.zip)
| which is always prepended to sys.path. So a directory with python
| and python.zip is complete. This would work fine on Windows, as
| it is common to put supporting files in the directory of the
| executable. But it may offend Unix fans, who dislike bin
| directories being used for libraries. It might be fine to
| generate different defaults for Windows and Unix if necessary, but
| the code will be in C, and there is no sense getting complicated.
Well, this is the domain of getpath.c, and that's got a different
implementation for Unix and Windows anyway (Windows has PC/getpathp.c).
| Implementation
|
| A C implementation exists which works, but which can be made better.
Upload as a patch please?
--Guido van Rossum (home page: http://www.python.org/~guido/)