
The PEP for zip import is 273. Please take a look and comment.
OK, I'll shoot. But I expect Gordon McMillan and Greg Stein to provide more useful feedback. | Currently, sys.path is a list of directory names as strings. If | this PEP is implemented, an item of sys.path can be a string | naming a zip file archive. The zip archive can contain a | subdirectory structure to support package imports. The zip | archive satisfies imports exactly as a subdirectory would. I like this. | The implementation is in C code in the Python core and works on | all supported Python platforms. This is good too, as it provides a bootstrap. OTOH I also would like to see a prototype in Python, using either ihooks or imputil. | Any files may be present in the zip archive, but only files *.pyc, | *.pyo and __init__.py[co] are available for import. Zip import of | *.py and dynamic modules (*.pyd, *.so) is disallowed. | | Just as sys.path currently has default directory names, default | zip archive names are added too. Otherwise there is no way to | import all Python library files from an archive. More bootstrap goodness. | Reading compressed zip archives requires the zlib module. An | import of zlib will be attempted prior to any other imports. If | zlib is not available at that time, only uncompressed archives | will be readable, even if zlib subsequently becomes available. Hm, I wonder if we couldn't just link with the libz.a C library and use the C interface, if you're implementing this in C anyway. | Subdirectory Equivalence | | The zip archive must be treated exactly as a subdirectory tree so | we can support package imports based on current and future rules. | Zip archive files must be created with relative path names. That | is, archive file names are of the form: file1, file2, dir1/file3, | dir2/dir3/file4. | | Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip", | and we are trying to import modfoo from the Q package. Then | import.c will generate a list of paths and extensions and will | look for the file. The list of generated paths does not change | for zip imports. (Very clever.) Suppose import.c generates the path | "/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path | "/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is | exactly equivalent to finding "Q/R/modfoo.pyc" in the archive. Nice. | Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then | your zip file will satisfy imports just as your subdirectory did. | | Well, not quite. You can't satisfy dynamic modules from a zip | file. Dynamic modules have extensions like .dll, .pyd, and .so. | They are operating system dependent, and probably can't be loaded | except from a file. It might be possible to extract the dynamic | module from the zip file, write it to a plain file and load it. | But that would mean creating temporary files, and dealing with all | the dynload_*.c, and that's probably not a good idea. Agreed. | You also can't import source files *.py from a zip archive. The | problem here is what to do with the compiled files. Python would | normally write these to the same directory as *.py, but surely we | don't want to write to the zip file. We could write to the | directory of the zip archive, but that would clutter it up, not | good if it is /usr/bin for example. We could just fail to write | the compiled files, but that makes zip imports very slow, and the | user would probably not figure out what is wrong. It is probably | best for users to put *.pyc into zip archives in the first place, | and this PEP enforces that rule. I agree. But it would still be good if the .py files were also in the zip file, so the source can be used in tracebacks etc. A C API to get a source line from a filename might be a good idea (plus a Python API). | So the only imports zip archives support are *.pyc and *.pyo, plus | the import of __init__.py[co] for packages, and the search of the | subdirectory structure for the same. I wonder if we need to make an additional rule that allows a .pyc file to satisfy a module request even if we're in optimized mode (where normally only .pyo files are searched). Otherwise, if someone ships a zipfile with only .pyc files, their modules can't be imported at all when python -O is used. | Efficiency | | The only way to find files in a zip archive is linear search. But there's an index record at the end that provides quick access. So | for each zip file in sys.path, we search for its names once, and | put the names plus other relevant data into a static Python | dictionary. The key is the archive name from sys.path joined with | the file name (including any subdirectories) within the archive. | This is exactly the name generated by import.c, and makes lookup | easy. We could do this kind of pre-scanning for regular dictionaries on sys.path too. I found out very long ago (around '93 or '94) that this saves a *lot* of startup time; I presume it still does. (And even more if the info can be cached in a file.) The only problem is how to detect when the cache becomes out of date. Of course, you could say "if you want faster startup time, put all your files in a zip archive", and I couldn't really argue with that. :-) | zlib | | Compressed zip archives require zlib for decompression. Prior to | any other imports, we attempt an import of zlib, and set a flag if | it is available. All compressed files are invisible unless this | flag is true. Do we get an "module not found" error or something better, like "compressed module found as <filename> but zlib unavailable"? | It could happen that zlib was available later. For example, the | import of site.py might add the correct directory to sys.path so a | dynamic load succeeds. But compressed files will still be | invisible. It is unknown if it can happen that importing site.py | can cause zlib to appear, so maybe we're worrying about nothing. | On Windows and Linux, the early import of zlib succeeds without | site.py. Yes, site.py isn't needed to make standard library modules available; it's intended to make non-standare library modules available. :-) | The problem here is the confusion caused by the reverse. Either a | zip file satisfies imports or it doesn't. It is silly to say that | site.py needs to be uncompressed, and that maybe imports will | succeed later. If you don't like this, create uncompressed zip | archives or make sure zlib is available, for example, as a | built-in module. Or we can write special search logic during zip | initialization. I don't think we need anything special here. site.py shouldn't be needed. | Booting | | Python imports site.py itself, and this imports os, nt, ntpath, | stat, and UserDict. It also imports sitecustomize.py which may | import more modules. Zip imports must be available before site.py | is imported. | | Just as there are default directories in sys.path, there must be | one or more default zip archives too. | | The problem is what the name should be. The name should be linked | with the Python version, so the Python executable can correctly | find its corresponding libraries even when there are multiple | Python versions on the same machine. | | This PEP suggests a zip archive name equal to the Python | interpreter path with extension ".zip" (eg, /usr/bin/python.zip) | which is always prepended to sys.path. So a directory with python | and python.zip is complete. This would work fine on Windows, as | it is common to put supporting files in the directory of the | executable. But it may offend Unix fans, who dislike bin | directories being used for libraries. It might be fine to | generate different defaults for Windows and Unix if necessary, but | the code will be in C, and there is no sense getting complicated. Well, this is the domain of getpath.c, and that's got a different implementation for Unix and Windows anyway (Windows has PC/getpathp.c). | Implementation | | A C implementation exists which works, but which can be made better. Upload as a patch please? --Guido van Rossum (home page: http://www.python.org/~guido/)