PEP 273: Import Modules from Zip Archives

The PEP for zip import is 273. Please take a look and comment. http://python.sourceforge.net/peps/pep-0273.html Jim Ahlstrom

The PEP for zip import is 273. Please take a look and comment.
OK, I'll shoot. But I expect Gordon McMillan and Greg Stein to provide more useful feedback. | Currently, sys.path is a list of directory names as strings. If | this PEP is implemented, an item of sys.path can be a string | naming a zip file archive. The zip archive can contain a | subdirectory structure to support package imports. The zip | archive satisfies imports exactly as a subdirectory would. I like this. | The implementation is in C code in the Python core and works on | all supported Python platforms. This is good too, as it provides a bootstrap. OTOH I also would like to see a prototype in Python, using either ihooks or imputil. | Any files may be present in the zip archive, but only files *.pyc, | *.pyo and __init__.py[co] are available for import. Zip import of | *.py and dynamic modules (*.pyd, *.so) is disallowed. | | Just as sys.path currently has default directory names, default | zip archive names are added too. Otherwise there is no way to | import all Python library files from an archive. More bootstrap goodness. | Reading compressed zip archives requires the zlib module. An | import of zlib will be attempted prior to any other imports. If | zlib is not available at that time, only uncompressed archives | will be readable, even if zlib subsequently becomes available. Hm, I wonder if we couldn't just link with the libz.a C library and use the C interface, if you're implementing this in C anyway. | Subdirectory Equivalence | | The zip archive must be treated exactly as a subdirectory tree so | we can support package imports based on current and future rules. | Zip archive files must be created with relative path names. That | is, archive file names are of the form: file1, file2, dir1/file3, | dir2/dir3/file4. | | Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip", | and we are trying to import modfoo from the Q package. Then | import.c will generate a list of paths and extensions and will | look for the file. The list of generated paths does not change | for zip imports. (Very clever.) Suppose import.c generates the path | "/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path | "/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is | exactly equivalent to finding "Q/R/modfoo.pyc" in the archive. Nice. | Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then | your zip file will satisfy imports just as your subdirectory did. | | Well, not quite. You can't satisfy dynamic modules from a zip | file. Dynamic modules have extensions like .dll, .pyd, and .so. | They are operating system dependent, and probably can't be loaded | except from a file. It might be possible to extract the dynamic | module from the zip file, write it to a plain file and load it. | But that would mean creating temporary files, and dealing with all | the dynload_*.c, and that's probably not a good idea. Agreed. | You also can't import source files *.py from a zip archive. The | problem here is what to do with the compiled files. Python would | normally write these to the same directory as *.py, but surely we | don't want to write to the zip file. We could write to the | directory of the zip archive, but that would clutter it up, not | good if it is /usr/bin for example. We could just fail to write | the compiled files, but that makes zip imports very slow, and the | user would probably not figure out what is wrong. It is probably | best for users to put *.pyc into zip archives in the first place, | and this PEP enforces that rule. I agree. But it would still be good if the .py files were also in the zip file, so the source can be used in tracebacks etc. A C API to get a source line from a filename might be a good idea (plus a Python API). | So the only imports zip archives support are *.pyc and *.pyo, plus | the import of __init__.py[co] for packages, and the search of the | subdirectory structure for the same. I wonder if we need to make an additional rule that allows a .pyc file to satisfy a module request even if we're in optimized mode (where normally only .pyo files are searched). Otherwise, if someone ships a zipfile with only .pyc files, their modules can't be imported at all when python -O is used. | Efficiency | | The only way to find files in a zip archive is linear search. But there's an index record at the end that provides quick access. So | for each zip file in sys.path, we search for its names once, and | put the names plus other relevant data into a static Python | dictionary. The key is the archive name from sys.path joined with | the file name (including any subdirectories) within the archive. | This is exactly the name generated by import.c, and makes lookup | easy. We could do this kind of pre-scanning for regular dictionaries on sys.path too. I found out very long ago (around '93 or '94) that this saves a *lot* of startup time; I presume it still does. (And even more if the info can be cached in a file.) The only problem is how to detect when the cache becomes out of date. Of course, you could say "if you want faster startup time, put all your files in a zip archive", and I couldn't really argue with that. :-) | zlib | | Compressed zip archives require zlib for decompression. Prior to | any other imports, we attempt an import of zlib, and set a flag if | it is available. All compressed files are invisible unless this | flag is true. Do we get an "module not found" error or something better, like "compressed module found as <filename> but zlib unavailable"? | It could happen that zlib was available later. For example, the | import of site.py might add the correct directory to sys.path so a | dynamic load succeeds. But compressed files will still be | invisible. It is unknown if it can happen that importing site.py | can cause zlib to appear, so maybe we're worrying about nothing. | On Windows and Linux, the early import of zlib succeeds without | site.py. Yes, site.py isn't needed to make standard library modules available; it's intended to make non-standare library modules available. :-) | The problem here is the confusion caused by the reverse. Either a | zip file satisfies imports or it doesn't. It is silly to say that | site.py needs to be uncompressed, and that maybe imports will | succeed later. If you don't like this, create uncompressed zip | archives or make sure zlib is available, for example, as a | built-in module. Or we can write special search logic during zip | initialization. I don't think we need anything special here. site.py shouldn't be needed. | Booting | | Python imports site.py itself, and this imports os, nt, ntpath, | stat, and UserDict. It also imports sitecustomize.py which may | import more modules. Zip imports must be available before site.py | is imported. | | Just as there are default directories in sys.path, there must be | one or more default zip archives too. | | The problem is what the name should be. The name should be linked | with the Python version, so the Python executable can correctly | find its corresponding libraries even when there are multiple | Python versions on the same machine. | | This PEP suggests a zip archive name equal to the Python | interpreter path with extension ".zip" (eg, /usr/bin/python.zip) | which is always prepended to sys.path. So a directory with python | and python.zip is complete. This would work fine on Windows, as | it is common to put supporting files in the directory of the | executable. But it may offend Unix fans, who dislike bin | directories being used for libraries. It might be fine to | generate different defaults for Windows and Unix if necessary, but | the code will be in C, and there is no sense getting complicated. Well, this is the domain of getpath.c, and that's got a different implementation for Unix and Windows anyway (Windows has PC/getpathp.c). | Implementation | | A C implementation exists which works, but which can be made better. Upload as a patch please? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Hm, I wonder if we couldn't just link with the libz.a C library and use the C interface, if you're implementing this in C anyway.
Since zlib is often in a DLL or shared library (.so), I am afraid of portability problems if we use the C interface. For example, on Windows, linking python.dll with zlib.dll would cause python.dll to fail if zlib.dll were unavailable, even if zlib.dll were never accessed. It also happens that use of the zlib module is easier.
Yes, I agree. How about if we look for the correct .py[co] first, and if that fails, look for the other? Either will satisfy the import, right? If .pyc is wanted, .pyo is OK too.
| The only way to find files in a zip archive is linear search.
But there's an index record at the end that provides quick access.
It is this index which is searched linearly.
Do we get an "module not found" error or something better, like "compressed module found as <filename> but zlib unavailable"?
We get "module not found". The second is awkward, because the module may be found later in sys.path.
Well, this is the domain of getpath.c, and that's got a different implementation for Unix and Windows anyway (Windows has PC/getpathp.c).
I would like discussion on what the additional sys.path names are.
| A C implementation exists which works, but which can be made better.
I can upload it for inspection, but I don't want it to be a patch because it is not done yet. JimA

OK, good enough. (I'm getting more and more curious about your implementation. ;-)
Sounds good to me -- this might be a good rule anyhoo.
That can't possibly take any time compared to other stuff -- it's a relatively short in-memory table.
Hm. Does it at least spit out a warning when zlib is needed but not available? It can be awkward to debug the situation where you get a module from the filesystem rather than the version from the zipfile, even though it is in the zipfile. While the compression method is listed in the zip info, it's not the first thing someone would look for unless they were aware that this failure mode existed. Since obviously the intention of putting the module in the zipfile was that it should be found, I think that failure to decompress should be turned into an immediate error -- the same way as a syntax error gets reported and not turned into a "skip to the next directory in sys.path" effect.
Well, propose some.
Just say so in the patch. You can upoad anything you want. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Just say so in the patch. You can upoad anything you want. :-)
The implementation is in patch 476047. http://sourceforge.net/tracker/index.php?func=detail&aid=476047&group_id=5470&atid=305470 JimA

Guido van Rossum wrote: [Me]
I would like discussion on what the additional sys.path names are.
Well, propose some.
OK. I propose that there is one name added to sys.path, and the file name is "python%s%s.zip" % (sys.version[0], sys.version[2]). For example, python22.zip. This is the same on all platforms. On Unix, the directory is sys.prefix + "/lib". So for prefix /usr/local, the path /usr/local/lib/python2.2/ is already on sys.path, and /usr/local/lib/python22.zip would be added. On Windows, the directory is the directory of sys.executable. The zip archive name is always inserted as the second item in sys.path. The first always seems to be ''. JimA

Sounds good to me. Somebody please update the PEP. --Guido van Rossum (home page: http://www.python.org/~guido/)

On 31 October 2001, James C. Ahlstrom said:
Did we decide to allow import of *.py?
Here's an idea that would apply to importing *.py whether inside ZIP files or not: if unable to write the .pyc, write a warning to stderr. That'll be useful to people who inadvertently put .py files without .pyc in a ZIP file, and to people who try to import .py files from a directory they can't write to, etc. Now that Python has a warning framework, warnings like unable to write /usr/lib/python2.2/site-packages/foo.pyc: permission denied or unable to write /usr/lib/python2.2/site-packages/foo.zip/foo.pyc: can't write to ZIP file sound like useful warnings. Greg -- Greg Ward - programmer-at-large gward@python.net http://starship.python.net/~gward/ Save energy: be apathetic.

"FL" == Fredrik Lundh <fredrik@pythonware.com> writes:
FL> can anyone explain why I shouldn't be allowed to ship, say, FL> PIL's Python code in a version-independent ZIP file? Excellent point! We ought to collect some requirements/goals for the ZIP import stuff. The current draft of the PEP is a little thin on motivation. Requirement: A ZIP archive be usable any version of Python that supports ZIP-based imports. It might also be nice to have a imputil- or ihook-based importer that works with older versions of Python. Jeremy

Jeremy Hylton:
Excellent point!
Being the permissive sort, I'd say you should be allowed. But I think it would be foolish.
Speed, ease of deployment, minimal disk space. MacPython does it, Jython does it, Installer's been doing it since '98 (although it doesn't use zip format archives).
Requirement: A ZIP archive be usable any version of Python that supports ZIP-based imports.
Hmm. You want .pyc's with every known magic number (there goes disk space)? Or just source and throw away the .pyc? Or maybe insert into the zip file? Either way, there goes speed. Besides, as I noted, and Guido concurred, we need some way to handle extension modules "inside" packages, since they can't go inside the zip. On Windows, extensions are only usable by one version of Python. This is silliness. I have clients with versions 1.52 to 2.1. I have a fair amount of code I reuse, and I keep it version- independent. I have had to change something in it for every Python version to keep it version independent.
It might also be nice to have a imputil- or ihook-based importer that works with older versions of Python.
There's at least one imputil-based one that works with zipfile. - Gordon

>> Requirement: A ZIP archive be usable any version of Python that >> supports ZIP-based imports. Gordon> Hmm. You want .pyc's with every known magic number (there goes Gordon> disk space)? Or just source and throw away the .pyc? Or maybe Gordon> insert into the zip file? Either way, there goes speed. Can't zip files be treated conceptually as directories? If I import a py-based module from a zip archive I see no particular reason the byte-compiled file can't be added to the archive (conceptually, written to the directory), speeding up the import the next time. Is this not possible? Skip

Jeremy:
Skip:
Zip files are normally unpacked to the filesystem and rebuilt when modified. If this feature is desired, I think MetaKit would be a much better file format (I have used MetaKit as a pyc archive so I could replace individual modules). But unless you want to big-boy SQL servers as "archives", you're going to have trouble with multi-user. To combine this with Jeremy's requirement, we can't put the .pyc's alongside the py's anyway, we need some (disappearing) node in the hierarchy to hold the byte code version (magic number). A bit more trickery for import.c. We can meet a real need as long as we don't put tail fins and heavy chrome all over a Yugo (zip files). They're easy to produce & ship. I don't think we should disallow .py files, but I think 99% of their use will be version specific .pyc's only (and many of those to users who think "traceback, innermost last" is Aramaic). It's nothing compared to the effort of dealing with the binaries across versions (or even the std lib across versions, sometimes). OTOH, I'm worried that the extension-modules-inside- packages requirement is going to break import.c's back. There are many undocumented features of the implementation of import that are relied on (such as xml.__init__'s trick, or extentions modules inside packages). Heck, we might even have to clarify the rules. doom-and-gloom-and-goblins-ly y'rs - Gordon

- Writing stuff back to .zip files is totally the wrong approach. - I don't care about having .so files inside packages *in zipfiles*. - I'm not sure I care about having .so files inside packages on the filesystem; they are useful in Zope, but for very hackish reasons. - If the zip file has the .py file but no .pyc or the wrong .pyc, tant pis. Let it be slower. (But if it has the .pyo, use that.) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
- I'm not sure I care about having .so files inside packages on the filesystem; they are useful in Zope, but for very hackish reasons.
Why? If I write a package which is mostly in Python, it feels very natural to put the C extensions also in the package. Just

Yes, it does, and as long as it works, I have no problem with that. Distutils supports this too, AFAIK. But if there are mechanical problems with making it work (zip files are a good example of that) I don't see why we should torture ourselves to get it to work when simpler solutions exist (such as putting the extension at the top level under a private name with the package name as a prefix). BTW, the hackish reasons I referred to are this: Zope often wants to *replace* existing extensions with its own versions, and places e.g. its own cPickle.so insize a package to force the import even if cPickle is built-in statically. But I'm not sure that that works if there's a toplevel cPickle.so which has already been imported; it may work on some systems but fail on others, depending on the shared library architecture (often one of the most hackish parts of user-space OS support). --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
- I'm not sure I care about having .so files inside packages on the filesystem; they are useful in Zope, but for very hackish reasons.
[Just]
Why? If I write a package which is mostly in Python, it feels very natural to put the C extensions also in the package.
[Guido]
Yes, it does, and as long as it works, I have no problem with that. Distutils supports this too, AFAIK.
Ah, ok, I think I misread your comment above as "I'm not sure I care about allowing .so files inside packages to begin with", hence my "why". Just

"GW" == Greg Ward <gward@python.net> writes:
GW> Here's an idea that would apply to importing *.py whether inside GW> ZIP files or not: if unable to write the .pyc, write a warning GW> to stderr. This is exactly what the interpreter does when run with -v. If you load a .py file and can't write the .pyc -- say you don't have write permission in the directory -- it prints a warning to stderr. +1 on ZIP import doing the same thing Jeremy

[James C. Ahlstrom]
The zip archive name is always inserted as the second item in sys.path. The first always seems to be ''.
Running p.py: import sys print `sys.path[0]` on Windows gives: C:\Code\python\PCbuild>python p.py '' C:\Code\python\PCbuild>cd .. C:\Code\python>pcbuild\python pcbuild\p.py 'pcbuild' C:\Code\python>pcbuild\python \code\python\pcbuild\p.py '\\code\\python\\pcbuild' C:\Code\python>cd pcbuild C:\Code\python\PCbuild>python .\p.py '.' C:\Code\python\PCbuild>move p.py ..\lib C:\CODE\PYTHON\PCBUILD\p.py => C:\CODE\PYTHON\lib\p.py [ok] C:\Code\python\PCbuild>python ..\lib\p.py '..\\lib' C:\Code\python\PCbuild> That is, the first entry is the path (relative or absolute!) to the directory in which the script being executed lives.

Sorry for the delay: Jim writes:
On Windows, the directory is the directory of sys.executable.
Any chance this can be in sys.prefix, else the directory of sys.executable if sys.prefix is empty? The reason is for embedding situations - sys.executable may not be a reasonable watermark. We recently had a bug regarding os.popen() on Windows for the exact same reason, and a patch was recently checked in that goes to great lengths to ensure sys.prefix is always valid even in these embedding situations. Mark.

Mark Hammond wrote:
Hmmm...., you are right. Sys.executable doesn't really work for embedding. But sys.prefix is obtained from a search of the directory structure for a "landmark" file, namely os.py. When the Python library is in a zip file, it is likely that no landmark files will be found, and sys.prefix will contain garbage. Since sys.prefix is searched for, its name is unpredictable. We need a known location for python22.zip. How about using the full path name of pythonXX.dll with the last three characters changed to "zip"? This associates the libraries with the DLL, which is more logical than associating them with the executable. And the file name is identical but with "zip" instead of "dll". Does this work, and solve all embedding problems? JimA

[James C. Ahlstrom]
The PEP for zip import is 273. Please take a look and comment.
Why can't .py be allowed? If a more recent .py[co] (or $py.class) exists, it is used. Otherwise the .py file is compiled and discarded when the process ends. Sure, it is slower, but a zip files with only .py[co] entries would be of little use with jython.
A standard for this would be really cool.
So python packages and modules can exists *only* at the top level? That would conflict somewhat with jython where at would be common to put python modules into the same .zip file as java classes and java classes also wants to own the root of a zip file. In the current implementaion in jython, we can put python modules under any path and add the zipfile!path to sys.path: sys.path.append("/path/to/zipfile.zip!Lib") which will look for Lib/Q/R/modfoo.py (and Lib/Q/R/modfoo$py.class) in the archive.
[I found efficiency hard to achieve in jython because opening a zipfile in java also cause the zip index to be read into a dictionary. So we did not want to reopen a zipfile if it can be avoided. Instead we hide a reference to the opened file in sys.path so that when removing a zipfile name from sys.path, the file is eventually closed.] Would entries in the static python dict be removed when a zipfile is removed from sys.path? What is the __path__ vrbl set to in a module imported from a zipfile? Can the module make changes to __path__ and will be changes to used when importing submodules? What value should __file__ have? regards, finn

Yup.
Maybe it's possible to allow "/path/to/zipfile.zip/subdirectory/etc/" in sys.path? That sounds better than picking a random new character as delimiter.
I don't think Python has this problem, since we have control over the zipfile reading code.
Would entries in the static python dict be removed when a zipfile is removed from sys.path?
It can be arranged that they are removed at some later point (e.g. when sys.path is next searched).
IMO these two questions are answered by the pathname semantics that Jim proposes: __file__ = "/C/D/E/Archive.zip/Q/R/modfoo.pyc" and __path__ = ["/C/D/E/Archive.zip/Q/R"]. --Guido van Rossum (home page: http://www.python.org/~guido/)

[me]
[GvR]
Fine by me. From a java POV the bang ("!") was not random:
The URL library in java can of course open the string above as an input stream.
An API to do this could be usefull for jython. Right now we depend on the GC thread to close the file. Since files are a limited resource it would be a good thing to have an explicit way to clean up the cached resources. I don't expect the lack of a cleanup method to be a huge problem in real life.
Ok. I assume that as long as some module exists with a __path__ like this, it not possible to clear the cached entries for /C/D/E/Archive.zip. If importing from zip is regarded mainly as a deployment/bootstrapping feature, then the cleanup issues does not exist. I have no problem looking at it as a deployment feature (that is all I need it for myself), I just didn't dare to put such a limit on my jython implementation. regards, finn

[Finn]
I guess the advantage of this notation is that it makes a simple check for "file inside zip archive" possible; I sort of like that. The downside is that it limits the use of "!" for other reasons in pathnames (which seems a mostly but not entirely theoretical problem).
I assume that as long as some module exists with a __path__ like this, it not possible to clear the cached entries for /C/D/E/Archive.zip.
Good point.
I don't see sys.path as a very dynamic thing anyway. It gets manipulated briefly at the beginning, and then maybe, rarely, stuff gets added during the program run. I've seen temporary additions, but these were usually in the test suite. --Guido van Rossum (home page: http://www.python.org/~guido/)

Finn Bock wrote:
The static Python dictionary is a memory object which uses no open file descriptors. If an element of sys.path contains ".zip" and hasn't been seen before, the zip archive is opened, searched, and closed. The key is the name, the value includes the archive name and offset. Thereafter, the zip archive is never opened unless it contains a needed file. So I don't think there is a problem.
myself), I just didn't dare to put such a limit on my jython implementation.
I don't understand the jpython implementation, so please point out all problems so we can fix them now. JimA

[me]
myself), I just didn't dare to put such a limit on my jython implementation.
[James C. Ahlstrom]
I don't understand the jpython implementation, so please point out all problems so we can fix them now.
A naive implementation (like the first one I made for Jython) does not try to handle the cleanup issues. For example, how much memory have been consumed and not freed after the last statement in this little silly program: import sys, zipfile, time zip = zipfile.ZipFile("archive.zip", "w") for i in range(10000): entry = zipfile.ZipInfo() entry.filename = "module%06d.py" % i entry.date_time = time.gmtime(time.time()) zip.writestr(entry, "# Not used\n") zip.close() sys.path.append("archive.zip") try: import notfound except ImportError: pass sys.path.pop() If I read your preliminary patch correctly, the 10000 entries will remain in the ArchiveMembers dict forever and that is perfectly fine after Guido comments on sys.path being mostly a static feature. Jython manage to clean the member-cache when an archive is removed from sys.path but is was quite tricky to achieve. We do it by replacing zip entries in sys.path (like "archive.zip") with a subclass of string (SyspathArchive) that also holds a reference to the member cache. From pythons POV the sys.path entry is still a string with the same value (but with a different id). regards, finn

Finn Bock wrote:
The *.py are allowed to be in the file, and jpython can use them. In fact, any files at all can be in the archive. It is just that C-Python ignores them. My reason was that if *.py[co] are missing or out of date, zip importing will be slow and users won't figure out what is wrong. But I am open to changing this.
Just as sys.path currently has default directory names, default A standard for this would be really cool.
Yes, lets make a standard now.
I am confused. Zip archives are equivalant to subdirectories, so there is no requirement to have anything at the top level. Your example seems to imply a second search path besides sys.path. BTW, the code uses ".zip" as the archive flag, not a special character '!'.
No, entries would not be removed. But they would not be found either, because their names would not be generated from the new sys.path.
The __file__ is /A/B/archive.zip/name.py. There is no special code for __file__ nor __path__, the path name just has a ".zip" in it. JimA

[Finn]
[JimA]
I think Finn simply means that the zipfile may have some redundant initial suffix to all filenames (e.g. "Lib/") which could be an artefact of how the zip file is created (zip up this directory) or intended as an aid for other uses of the zipfile, like unpacking. (In the tar world, it's considered impolite not to have a common prefix; without that, untarring can too easily populate an innocent user's home directory with thousands of untarred files.)
BTW, the code uses ".zip" as the archive flag, not a special character '!'.
That's cool. --Guido van Rossum (home page: http://www.python.org/~guido/)

"James C. Ahlstrom" wrote:
The PEP for zip import is 273. Please take a look and comment.
Looks good to me. Just three questions: 1. Why are .py files being outlawed ? 2. Where's the C implementation you mention in the PEP ? 3. Would it be possible to ship zlib together with Python ? (the zlib license should allow this and I don't think that the code size is too big) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
1. Why are .py files being outlawed ?
My reason was that if *.py[co] are missing or out of date, zip importing will be slow and users won't figure out what is wrong. Generally I favor user-proof features over expert features. I prefer things which either "Just Work" or fail spectacularly. But I am open to changing this.
2. Where's the C implementation you mention in the PEP ?
Software is like pancakes, you should always throw the first one away. I will post it if you want, but it is not done.
OK by me. But uncompressed zip archives work, and may even be faster than conpressed archives. JimA

Good point. Uncompressed archives will definitely be faster. I'd be happy to legislate uncompressed archives, except I'm worried that not all tools commonly used to create zip archives make it easy to turn off compression. --Guido van Rossum (home page: http://www.python.org/~guido/)

"James C. Ahlstrom" wrote:
If you don't include the *.py file in the archive, chances are high that tracebacks will no longer print out as they do today (another problem here is that the filename being used in the code object will probably not be found... not sure whether we can fix this one). By only looking at the .pyc or .pyo you'll also introduce a Python version problem into the ZIP-archive: the magic in these files changs rather frequently and e.g. a more recent release of Python won't be able to load these from a ZIP-archive which only contains .pycs from a compile by a less recent version.
That's true :-) Still, I'd like a chance to at least look at the impact this has on import.c and of course play with it so that I can test it in everyday situations.
Even if uncompressed archive would work faster, the compiled libz is only 70k on Linux and I think we could solve a great number of (zlib version) problems by including zlib in Python. It's one of those basic tools you need frequently, but is even more frequently not configured as Python module :-( Since it is already included in the Windows builds, I guess adding it to core for Unix and Macs too wouldn't hurt all that much. It would also save you the trouble of maintaining the code for scanning uncompressed zip-archives in your ZIP import code, so we win on two counts :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

Jim Ahlstrom wrote: [From PEP 273]
Currently, sys.path is a list of directory names as strings.
A nit: that's in practice, but not by definition (a couple std modules have been patched to allow other things on sys.path). After extensive use of imputil (which puts objects on sys.path), I think we might as well make it official that sys.path is a list of strings. [Subdirectory Equivalence] This is a bit misleading - it can be read as implying that the importable modules are in a flat namespace. They're not. To get R.Q.modfoo imported, R.__init__ and R.Q.__init__ must be imported. The __init__ modules have the opportunity to play with __path__ to break the equivalence of R.Q.modfoo to R/Q/modfoo.py. Or (more likely), play games with attributes so that Q is, say, an instance, or maybe a module imported from someplace else. Question: if Archive.zip/Q/__init__.pyc does "__path__.append('some/real/directory')", will it work? [dynamic libs]
It might be possible to extract the dynamic module from the zip file, write it to a plain file and load it.
I think you can nail the door shut on this one. On many OSes, making dynamic libs available to a process requires settings that can only (sanely) be made before the process starts. OTOH, it's common practice these days to put dynamic libs inside packages. That needs to be dealt with at runtime, and at build time (since it breaks the expectation that you can just "zip up" the package). [no *.py files]
You also can't import source files *.py from a zip archive.
Apparently some Linuxes / RPM distributions don't deliver .pyc's or .pyo's. Since they're installed as root and run as some-poor-user, I'm afraid there are quite a few installations running off .py's all the time. So while it's definitely sub- optimal, I'm not sure it should be outlawed. [Guido]
But it would still be good if the .py files were also in the zip file, so the source can be used in tracebacks etc.
As you probably remember me saying before, I think this is a very minor nice-to-have. For one thing, you've probably done enough testing before stuffing your code into an archive that you're past the point of looking at a traceback and going "doh!". Most of the time you'll need to look at the code anyway. OTOH, this [more from Guido]:
A C API to get a source line from a filename might be a good idea (plus a Python API).
points the way towards something I'm very much in favor of: deferring things to that-which-did-the-importing. [Efficiency]
The key is the archive name from sys.path joined with the file name (including any subdirectories) within the archive.
DIfferent spellings of the same path are possible in a filesystem, but not in a dictionary. A bit of "harmless" tweaking of sys.path could render an archive unreachable. [zlib must be available at start] I'll agree, and agree with Guido that the coolest thing would be to make zlib standard. [Booting - python.zip should be part of the generated sys.path] Agree. Nice and straightforward. Now, from the discussion: [restate Jim] sys.path contains /C/D/E/Archive.zip and Archive.zip contains "Q/R/modfoo.pyc" so "import Q.R.modfoo" is satisfied by /C/D/E/Archive.zip/Q/R/modfoo.pyc [restate Finn] Jython has sys.path /C/D/E/Archive.zip!Lib and Archive.zip has "Lib/Q/R/modfoo.pyc" so "import Q.R.modfoo" is satisfied by /C/D/E/Archive.zip/Lib/Q/R/modfoo.pyc [restate Guido] Why not use /C/D/E/Archive.zip/Lib on sys.path? I use embedded archives. So sys.path will have an entry like: "/path/to/executable?84758" which says that the archive starts at position 84758 in the file "executable". Anything beyond the simple approach Jim takes gets into some very URL-ish territory. That's fine by me :-). I don't really like the idea of hacking special knowledge of zip files into import.c (which is already a specialist in filesystems). Like Finn said, if this is a deployment issue (we want zip files now, and are willing to live with strict limitations / rules to get it), then OK (as long as it supports __path__ and some way of dealing with dynamic libs in packages). Personally, I think package support stretched import.c to it's monolithic limits and it's high time the code was refactored to make it sanely extensible. - Gordon

[GMcM]
Interesting. So you think imputil is wrong to put objects there? Why? (Not arguing, just interested in your experience.)
It should.
ANd I believe some systems require shared libraries to be owned by root.
Yes. This is important.
Heh, this is an argument for Jim's position -- it would have come out during testing that way, since the imports would fail. ;-)
Testing is for wimps. :-)
Yup.
Hm, wouldn't the archive just be opened a second time? Or do I misundestand you?
But we'd have to make sure it's statically linked. (Fortunately, we already link it statically on Windows.)
Yes -- this has been on my TODO list for ages. But who's gonna DO it? --Guido van Rossum (home page: http://www.python.org/~guido/)

[Gordon]
Too much code (some left in the std lib) says: for d in sys.path: os.path.join(d, ...) and that's the line that barfs. It doesn't barf if d is a string. [Gordon]
Without seeing the C code, I don't know if it will open it a 2nd time. I keep an open file for each archive, but I'm careful that however it's spelled, I only open the archive once. Jim is apparently closing the file, so opening a 2nd is probably not as painful. OTOH an open / seek for each satisfied import is relatively expensive (though still cheaper than a fs import).
So to avoid statically linking it twice, I assume zlib would be a mandatory builtin. [Gordon]
I published iu4 which I think comes very close to the "right" model. If it's close enough, I'll download 2.2 and see if I can make those into new style object subclasses[1]. - Gordon [1] As an app developer, I usually have to look backwards, not forwards. I've got 6 machines, each with at least 2 Pythons, so I no longer deal with betas :-(. Even when they have cool things like generators and subclassable C types.

Gordon McMillan wrote:
A nit on a nit: Non-strings can still be allowed, because I can ignore them.
I feel that this is an excellent point, but I don't understand package import well enough to comment.
Question: if Archive.zip/Q/__init__.pyc does "__path__.append('some/real/directory')", will it work?
My guess (and it really is a guess) is Yes. The import.c code which generates directory names is untouched. My code simply looks for ".zip" in the generated name and branches into the zip archive at that point. So if the directory version works, the zip archive version should work too.
For Linux/RPM's I think shipping a library directory is better than a zip archive. It is easier to hack on a directory. I think of zip archives as a way to distribute packages, and as a replacement for freeze. OTOH, maybe we should allow *.py to satisfy imports even if it is slow and invisible.
True, but I have little sympathy for case-insensitive file systems. Tweaking of sys.path will have to be done with care. It helps that the '/' character is always used in zip, and both backslash and colon ":" are illegal in zip archive names. JimA

Gordon McMillan wrote:
There's no reason to believe that telling a Linux distribution maintainer that they "shouldn't" do it that way will be successful. Heck, he might be a Perl monkey on KP <wink>.
You don't need case-insensitive for this to come up. Relative paths vs. absolute paths; or paths that need norm-ing. And then you've got the mac which uses a different path syntax, and even slightly different path semantics. - Gordon

The PEP for zip import is 273. Please take a look and comment.
OK, I'll shoot. But I expect Gordon McMillan and Greg Stein to provide more useful feedback. | Currently, sys.path is a list of directory names as strings. If | this PEP is implemented, an item of sys.path can be a string | naming a zip file archive. The zip archive can contain a | subdirectory structure to support package imports. The zip | archive satisfies imports exactly as a subdirectory would. I like this. | The implementation is in C code in the Python core and works on | all supported Python platforms. This is good too, as it provides a bootstrap. OTOH I also would like to see a prototype in Python, using either ihooks or imputil. | Any files may be present in the zip archive, but only files *.pyc, | *.pyo and __init__.py[co] are available for import. Zip import of | *.py and dynamic modules (*.pyd, *.so) is disallowed. | | Just as sys.path currently has default directory names, default | zip archive names are added too. Otherwise there is no way to | import all Python library files from an archive. More bootstrap goodness. | Reading compressed zip archives requires the zlib module. An | import of zlib will be attempted prior to any other imports. If | zlib is not available at that time, only uncompressed archives | will be readable, even if zlib subsequently becomes available. Hm, I wonder if we couldn't just link with the libz.a C library and use the C interface, if you're implementing this in C anyway. | Subdirectory Equivalence | | The zip archive must be treated exactly as a subdirectory tree so | we can support package imports based on current and future rules. | Zip archive files must be created with relative path names. That | is, archive file names are of the form: file1, file2, dir1/file3, | dir2/dir3/file4. | | Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip", | and we are trying to import modfoo from the Q package. Then | import.c will generate a list of paths and extensions and will | look for the file. The list of generated paths does not change | for zip imports. (Very clever.) Suppose import.c generates the path | "/A/B/SubDir/Q/R/modfoo.pyc". Then it will also generate the path | "/C/D/E/Archive.zip/Q/R/modfoo.pyc". Finding the SubDir path is | exactly equivalent to finding "Q/R/modfoo.pyc" in the archive. Nice. | Suppose you zip up /A/B/SubDir/* and all its subdirectories. Then | your zip file will satisfy imports just as your subdirectory did. | | Well, not quite. You can't satisfy dynamic modules from a zip | file. Dynamic modules have extensions like .dll, .pyd, and .so. | They are operating system dependent, and probably can't be loaded | except from a file. It might be possible to extract the dynamic | module from the zip file, write it to a plain file and load it. | But that would mean creating temporary files, and dealing with all | the dynload_*.c, and that's probably not a good idea. Agreed. | You also can't import source files *.py from a zip archive. The | problem here is what to do with the compiled files. Python would | normally write these to the same directory as *.py, but surely we | don't want to write to the zip file. We could write to the | directory of the zip archive, but that would clutter it up, not | good if it is /usr/bin for example. We could just fail to write | the compiled files, but that makes zip imports very slow, and the | user would probably not figure out what is wrong. It is probably | best for users to put *.pyc into zip archives in the first place, | and this PEP enforces that rule. I agree. But it would still be good if the .py files were also in the zip file, so the source can be used in tracebacks etc. A C API to get a source line from a filename might be a good idea (plus a Python API). | So the only imports zip archives support are *.pyc and *.pyo, plus | the import of __init__.py[co] for packages, and the search of the | subdirectory structure for the same. I wonder if we need to make an additional rule that allows a .pyc file to satisfy a module request even if we're in optimized mode (where normally only .pyo files are searched). Otherwise, if someone ships a zipfile with only .pyc files, their modules can't be imported at all when python -O is used. | Efficiency | | The only way to find files in a zip archive is linear search. But there's an index record at the end that provides quick access. So | for each zip file in sys.path, we search for its names once, and | put the names plus other relevant data into a static Python | dictionary. The key is the archive name from sys.path joined with | the file name (including any subdirectories) within the archive. | This is exactly the name generated by import.c, and makes lookup | easy. We could do this kind of pre-scanning for regular dictionaries on sys.path too. I found out very long ago (around '93 or '94) that this saves a *lot* of startup time; I presume it still does. (And even more if the info can be cached in a file.) The only problem is how to detect when the cache becomes out of date. Of course, you could say "if you want faster startup time, put all your files in a zip archive", and I couldn't really argue with that. :-) | zlib | | Compressed zip archives require zlib for decompression. Prior to | any other imports, we attempt an import of zlib, and set a flag if | it is available. All compressed files are invisible unless this | flag is true. Do we get an "module not found" error or something better, like "compressed module found as <filename> but zlib unavailable"? | It could happen that zlib was available later. For example, the | import of site.py might add the correct directory to sys.path so a | dynamic load succeeds. But compressed files will still be | invisible. It is unknown if it can happen that importing site.py | can cause zlib to appear, so maybe we're worrying about nothing. | On Windows and Linux, the early import of zlib succeeds without | site.py. Yes, site.py isn't needed to make standard library modules available; it's intended to make non-standare library modules available. :-) | The problem here is the confusion caused by the reverse. Either a | zip file satisfies imports or it doesn't. It is silly to say that | site.py needs to be uncompressed, and that maybe imports will | succeed later. If you don't like this, create uncompressed zip | archives or make sure zlib is available, for example, as a | built-in module. Or we can write special search logic during zip | initialization. I don't think we need anything special here. site.py shouldn't be needed. | Booting | | Python imports site.py itself, and this imports os, nt, ntpath, | stat, and UserDict. It also imports sitecustomize.py which may | import more modules. Zip imports must be available before site.py | is imported. | | Just as there are default directories in sys.path, there must be | one or more default zip archives too. | | The problem is what the name should be. The name should be linked | with the Python version, so the Python executable can correctly | find its corresponding libraries even when there are multiple | Python versions on the same machine. | | This PEP suggests a zip archive name equal to the Python | interpreter path with extension ".zip" (eg, /usr/bin/python.zip) | which is always prepended to sys.path. So a directory with python | and python.zip is complete. This would work fine on Windows, as | it is common to put supporting files in the directory of the | executable. But it may offend Unix fans, who dislike bin | directories being used for libraries. It might be fine to | generate different defaults for Windows and Unix if necessary, but | the code will be in C, and there is no sense getting complicated. Well, this is the domain of getpath.c, and that's got a different implementation for Unix and Windows anyway (Windows has PC/getpathp.c). | Implementation | | A C implementation exists which works, but which can be made better. Upload as a patch please? --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Hm, I wonder if we couldn't just link with the libz.a C library and use the C interface, if you're implementing this in C anyway.
Since zlib is often in a DLL or shared library (.so), I am afraid of portability problems if we use the C interface. For example, on Windows, linking python.dll with zlib.dll would cause python.dll to fail if zlib.dll were unavailable, even if zlib.dll were never accessed. It also happens that use of the zlib module is easier.
Yes, I agree. How about if we look for the correct .py[co] first, and if that fails, look for the other? Either will satisfy the import, right? If .pyc is wanted, .pyo is OK too.
| The only way to find files in a zip archive is linear search.
But there's an index record at the end that provides quick access.
It is this index which is searched linearly.
Do we get an "module not found" error or something better, like "compressed module found as <filename> but zlib unavailable"?
We get "module not found". The second is awkward, because the module may be found later in sys.path.
Well, this is the domain of getpath.c, and that's got a different implementation for Unix and Windows anyway (Windows has PC/getpathp.c).
I would like discussion on what the additional sys.path names are.
| A C implementation exists which works, but which can be made better.
I can upload it for inspection, but I don't want it to be a patch because it is not done yet. JimA

OK, good enough. (I'm getting more and more curious about your implementation. ;-)
Sounds good to me -- this might be a good rule anyhoo.
That can't possibly take any time compared to other stuff -- it's a relatively short in-memory table.
Hm. Does it at least spit out a warning when zlib is needed but not available? It can be awkward to debug the situation where you get a module from the filesystem rather than the version from the zipfile, even though it is in the zipfile. While the compression method is listed in the zip info, it's not the first thing someone would look for unless they were aware that this failure mode existed. Since obviously the intention of putting the module in the zipfile was that it should be found, I think that failure to decompress should be turned into an immediate error -- the same way as a syntax error gets reported and not turned into a "skip to the next directory in sys.path" effect.
Well, propose some.
Just say so in the patch. You can upoad anything you want. :-) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Just say so in the patch. You can upoad anything you want. :-)
The implementation is in patch 476047. http://sourceforge.net/tracker/index.php?func=detail&aid=476047&group_id=5470&atid=305470 JimA

Guido van Rossum wrote: [Me]
I would like discussion on what the additional sys.path names are.
Well, propose some.
OK. I propose that there is one name added to sys.path, and the file name is "python%s%s.zip" % (sys.version[0], sys.version[2]). For example, python22.zip. This is the same on all platforms. On Unix, the directory is sys.prefix + "/lib". So for prefix /usr/local, the path /usr/local/lib/python2.2/ is already on sys.path, and /usr/local/lib/python22.zip would be added. On Windows, the directory is the directory of sys.executable. The zip archive name is always inserted as the second item in sys.path. The first always seems to be ''. JimA

Sounds good to me. Somebody please update the PEP. --Guido van Rossum (home page: http://www.python.org/~guido/)

On 31 October 2001, James C. Ahlstrom said:
Did we decide to allow import of *.py?
Here's an idea that would apply to importing *.py whether inside ZIP files or not: if unable to write the .pyc, write a warning to stderr. That'll be useful to people who inadvertently put .py files without .pyc in a ZIP file, and to people who try to import .py files from a directory they can't write to, etc. Now that Python has a warning framework, warnings like unable to write /usr/lib/python2.2/site-packages/foo.pyc: permission denied or unable to write /usr/lib/python2.2/site-packages/foo.zip/foo.pyc: can't write to ZIP file sound like useful warnings. Greg -- Greg Ward - programmer-at-large gward@python.net http://starship.python.net/~gward/ Save energy: be apathetic.

"FL" == Fredrik Lundh <fredrik@pythonware.com> writes:
FL> can anyone explain why I shouldn't be allowed to ship, say, FL> PIL's Python code in a version-independent ZIP file? Excellent point! We ought to collect some requirements/goals for the ZIP import stuff. The current draft of the PEP is a little thin on motivation. Requirement: A ZIP archive be usable any version of Python that supports ZIP-based imports. It might also be nice to have a imputil- or ihook-based importer that works with older versions of Python. Jeremy

Jeremy Hylton:
Excellent point!
Being the permissive sort, I'd say you should be allowed. But I think it would be foolish.
Speed, ease of deployment, minimal disk space. MacPython does it, Jython does it, Installer's been doing it since '98 (although it doesn't use zip format archives).
Requirement: A ZIP archive be usable any version of Python that supports ZIP-based imports.
Hmm. You want .pyc's with every known magic number (there goes disk space)? Or just source and throw away the .pyc? Or maybe insert into the zip file? Either way, there goes speed. Besides, as I noted, and Guido concurred, we need some way to handle extension modules "inside" packages, since they can't go inside the zip. On Windows, extensions are only usable by one version of Python. This is silliness. I have clients with versions 1.52 to 2.1. I have a fair amount of code I reuse, and I keep it version- independent. I have had to change something in it for every Python version to keep it version independent.
It might also be nice to have a imputil- or ihook-based importer that works with older versions of Python.
There's at least one imputil-based one that works with zipfile. - Gordon

>> Requirement: A ZIP archive be usable any version of Python that >> supports ZIP-based imports. Gordon> Hmm. You want .pyc's with every known magic number (there goes Gordon> disk space)? Or just source and throw away the .pyc? Or maybe Gordon> insert into the zip file? Either way, there goes speed. Can't zip files be treated conceptually as directories? If I import a py-based module from a zip archive I see no particular reason the byte-compiled file can't be added to the archive (conceptually, written to the directory), speeding up the import the next time. Is this not possible? Skip

Jeremy:
Skip:
Zip files are normally unpacked to the filesystem and rebuilt when modified. If this feature is desired, I think MetaKit would be a much better file format (I have used MetaKit as a pyc archive so I could replace individual modules). But unless you want to big-boy SQL servers as "archives", you're going to have trouble with multi-user. To combine this with Jeremy's requirement, we can't put the .pyc's alongside the py's anyway, we need some (disappearing) node in the hierarchy to hold the byte code version (magic number). A bit more trickery for import.c. We can meet a real need as long as we don't put tail fins and heavy chrome all over a Yugo (zip files). They're easy to produce & ship. I don't think we should disallow .py files, but I think 99% of their use will be version specific .pyc's only (and many of those to users who think "traceback, innermost last" is Aramaic). It's nothing compared to the effort of dealing with the binaries across versions (or even the std lib across versions, sometimes). OTOH, I'm worried that the extension-modules-inside- packages requirement is going to break import.c's back. There are many undocumented features of the implementation of import that are relied on (such as xml.__init__'s trick, or extentions modules inside packages). Heck, we might even have to clarify the rules. doom-and-gloom-and-goblins-ly y'rs - Gordon

- Writing stuff back to .zip files is totally the wrong approach. - I don't care about having .so files inside packages *in zipfiles*. - I'm not sure I care about having .so files inside packages on the filesystem; they are useful in Zope, but for very hackish reasons. - If the zip file has the .py file but no .pyc or the wrong .pyc, tant pis. Let it be slower. (But if it has the .pyo, use that.) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
- I'm not sure I care about having .so files inside packages on the filesystem; they are useful in Zope, but for very hackish reasons.
Why? If I write a package which is mostly in Python, it feels very natural to put the C extensions also in the package. Just

Yes, it does, and as long as it works, I have no problem with that. Distutils supports this too, AFAIK. But if there are mechanical problems with making it work (zip files are a good example of that) I don't see why we should torture ourselves to get it to work when simpler solutions exist (such as putting the extension at the top level under a private name with the package name as a prefix). BTW, the hackish reasons I referred to are this: Zope often wants to *replace* existing extensions with its own versions, and places e.g. its own cPickle.so insize a package to force the import even if cPickle is built-in statically. But I'm not sure that that works if there's a toplevel cPickle.so which has already been imported; it may work on some systems but fail on others, depending on the shared library architecture (often one of the most hackish parts of user-space OS support). --Guido van Rossum (home page: http://www.python.org/~guido/)

[Guido]
- I'm not sure I care about having .so files inside packages on the filesystem; they are useful in Zope, but for very hackish reasons.
[Just]
Why? If I write a package which is mostly in Python, it feels very natural to put the C extensions also in the package.
[Guido]
Yes, it does, and as long as it works, I have no problem with that. Distutils supports this too, AFAIK.
Ah, ok, I think I misread your comment above as "I'm not sure I care about allowing .so files inside packages to begin with", hence my "why". Just

"GW" == Greg Ward <gward@python.net> writes:
GW> Here's an idea that would apply to importing *.py whether inside GW> ZIP files or not: if unable to write the .pyc, write a warning GW> to stderr. This is exactly what the interpreter does when run with -v. If you load a .py file and can't write the .pyc -- say you don't have write permission in the directory -- it prints a warning to stderr. +1 on ZIP import doing the same thing Jeremy

[James C. Ahlstrom]
The zip archive name is always inserted as the second item in sys.path. The first always seems to be ''.
Running p.py: import sys print `sys.path[0]` on Windows gives: C:\Code\python\PCbuild>python p.py '' C:\Code\python\PCbuild>cd .. C:\Code\python>pcbuild\python pcbuild\p.py 'pcbuild' C:\Code\python>pcbuild\python \code\python\pcbuild\p.py '\\code\\python\\pcbuild' C:\Code\python>cd pcbuild C:\Code\python\PCbuild>python .\p.py '.' C:\Code\python\PCbuild>move p.py ..\lib C:\CODE\PYTHON\PCBUILD\p.py => C:\CODE\PYTHON\lib\p.py [ok] C:\Code\python\PCbuild>python ..\lib\p.py '..\\lib' C:\Code\python\PCbuild> That is, the first entry is the path (relative or absolute!) to the directory in which the script being executed lives.

Sorry for the delay: Jim writes:
On Windows, the directory is the directory of sys.executable.
Any chance this can be in sys.prefix, else the directory of sys.executable if sys.prefix is empty? The reason is for embedding situations - sys.executable may not be a reasonable watermark. We recently had a bug regarding os.popen() on Windows for the exact same reason, and a patch was recently checked in that goes to great lengths to ensure sys.prefix is always valid even in these embedding situations. Mark.

Mark Hammond wrote:
Hmmm...., you are right. Sys.executable doesn't really work for embedding. But sys.prefix is obtained from a search of the directory structure for a "landmark" file, namely os.py. When the Python library is in a zip file, it is likely that no landmark files will be found, and sys.prefix will contain garbage. Since sys.prefix is searched for, its name is unpredictable. We need a known location for python22.zip. How about using the full path name of pythonXX.dll with the last three characters changed to "zip"? This associates the libraries with the DLL, which is more logical than associating them with the executable. And the file name is identical but with "zip" instead of "dll". Does this work, and solve all embedding problems? JimA

[James C. Ahlstrom]
The PEP for zip import is 273. Please take a look and comment.
Why can't .py be allowed? If a more recent .py[co] (or $py.class) exists, it is used. Otherwise the .py file is compiled and discarded when the process ends. Sure, it is slower, but a zip files with only .py[co] entries would be of little use with jython.
A standard for this would be really cool.
So python packages and modules can exists *only* at the top level? That would conflict somewhat with jython where at would be common to put python modules into the same .zip file as java classes and java classes also wants to own the root of a zip file. In the current implementaion in jython, we can put python modules under any path and add the zipfile!path to sys.path: sys.path.append("/path/to/zipfile.zip!Lib") which will look for Lib/Q/R/modfoo.py (and Lib/Q/R/modfoo$py.class) in the archive.
[I found efficiency hard to achieve in jython because opening a zipfile in java also cause the zip index to be read into a dictionary. So we did not want to reopen a zipfile if it can be avoided. Instead we hide a reference to the opened file in sys.path so that when removing a zipfile name from sys.path, the file is eventually closed.] Would entries in the static python dict be removed when a zipfile is removed from sys.path? What is the __path__ vrbl set to in a module imported from a zipfile? Can the module make changes to __path__ and will be changes to used when importing submodules? What value should __file__ have? regards, finn

Yup.
Maybe it's possible to allow "/path/to/zipfile.zip/subdirectory/etc/" in sys.path? That sounds better than picking a random new character as delimiter.
I don't think Python has this problem, since we have control over the zipfile reading code.
Would entries in the static python dict be removed when a zipfile is removed from sys.path?
It can be arranged that they are removed at some later point (e.g. when sys.path is next searched).
IMO these two questions are answered by the pathname semantics that Jim proposes: __file__ = "/C/D/E/Archive.zip/Q/R/modfoo.pyc" and __path__ = ["/C/D/E/Archive.zip/Q/R"]. --Guido van Rossum (home page: http://www.python.org/~guido/)

[me]
[GvR]
Fine by me. From a java POV the bang ("!") was not random:
The URL library in java can of course open the string above as an input stream.
An API to do this could be usefull for jython. Right now we depend on the GC thread to close the file. Since files are a limited resource it would be a good thing to have an explicit way to clean up the cached resources. I don't expect the lack of a cleanup method to be a huge problem in real life.
Ok. I assume that as long as some module exists with a __path__ like this, it not possible to clear the cached entries for /C/D/E/Archive.zip. If importing from zip is regarded mainly as a deployment/bootstrapping feature, then the cleanup issues does not exist. I have no problem looking at it as a deployment feature (that is all I need it for myself), I just didn't dare to put such a limit on my jython implementation. regards, finn

[Finn]
I guess the advantage of this notation is that it makes a simple check for "file inside zip archive" possible; I sort of like that. The downside is that it limits the use of "!" for other reasons in pathnames (which seems a mostly but not entirely theoretical problem).
I assume that as long as some module exists with a __path__ like this, it not possible to clear the cached entries for /C/D/E/Archive.zip.
Good point.
I don't see sys.path as a very dynamic thing anyway. It gets manipulated briefly at the beginning, and then maybe, rarely, stuff gets added during the program run. I've seen temporary additions, but these were usually in the test suite. --Guido van Rossum (home page: http://www.python.org/~guido/)

Finn Bock wrote:
The static Python dictionary is a memory object which uses no open file descriptors. If an element of sys.path contains ".zip" and hasn't been seen before, the zip archive is opened, searched, and closed. The key is the name, the value includes the archive name and offset. Thereafter, the zip archive is never opened unless it contains a needed file. So I don't think there is a problem.
myself), I just didn't dare to put such a limit on my jython implementation.
I don't understand the jpython implementation, so please point out all problems so we can fix them now. JimA

[me]
myself), I just didn't dare to put such a limit on my jython implementation.
[James C. Ahlstrom]
I don't understand the jpython implementation, so please point out all problems so we can fix them now.
A naive implementation (like the first one I made for Jython) does not try to handle the cleanup issues. For example, how much memory have been consumed and not freed after the last statement in this little silly program: import sys, zipfile, time zip = zipfile.ZipFile("archive.zip", "w") for i in range(10000): entry = zipfile.ZipInfo() entry.filename = "module%06d.py" % i entry.date_time = time.gmtime(time.time()) zip.writestr(entry, "# Not used\n") zip.close() sys.path.append("archive.zip") try: import notfound except ImportError: pass sys.path.pop() If I read your preliminary patch correctly, the 10000 entries will remain in the ArchiveMembers dict forever and that is perfectly fine after Guido comments on sys.path being mostly a static feature. Jython manage to clean the member-cache when an archive is removed from sys.path but is was quite tricky to achieve. We do it by replacing zip entries in sys.path (like "archive.zip") with a subclass of string (SyspathArchive) that also holds a reference to the member cache. From pythons POV the sys.path entry is still a string with the same value (but with a different id). regards, finn

Finn Bock wrote:
The *.py are allowed to be in the file, and jpython can use them. In fact, any files at all can be in the archive. It is just that C-Python ignores them. My reason was that if *.py[co] are missing or out of date, zip importing will be slow and users won't figure out what is wrong. But I am open to changing this.
Just as sys.path currently has default directory names, default A standard for this would be really cool.
Yes, lets make a standard now.
I am confused. Zip archives are equivalant to subdirectories, so there is no requirement to have anything at the top level. Your example seems to imply a second search path besides sys.path. BTW, the code uses ".zip" as the archive flag, not a special character '!'.
No, entries would not be removed. But they would not be found either, because their names would not be generated from the new sys.path.
The __file__ is /A/B/archive.zip/name.py. There is no special code for __file__ nor __path__, the path name just has a ".zip" in it. JimA

[Finn]
[JimA]
I think Finn simply means that the zipfile may have some redundant initial suffix to all filenames (e.g. "Lib/") which could be an artefact of how the zip file is created (zip up this directory) or intended as an aid for other uses of the zipfile, like unpacking. (In the tar world, it's considered impolite not to have a common prefix; without that, untarring can too easily populate an innocent user's home directory with thousands of untarred files.)
BTW, the code uses ".zip" as the archive flag, not a special character '!'.
That's cool. --Guido van Rossum (home page: http://www.python.org/~guido/)

"James C. Ahlstrom" wrote:
The PEP for zip import is 273. Please take a look and comment.
Looks good to me. Just three questions: 1. Why are .py files being outlawed ? 2. Where's the C implementation you mention in the PEP ? 3. Would it be possible to ship zlib together with Python ? (the zlib license should allow this and I don't think that the code size is too big) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

"M.-A. Lemburg" wrote:
1. Why are .py files being outlawed ?
My reason was that if *.py[co] are missing or out of date, zip importing will be slow and users won't figure out what is wrong. Generally I favor user-proof features over expert features. I prefer things which either "Just Work" or fail spectacularly. But I am open to changing this.
2. Where's the C implementation you mention in the PEP ?
Software is like pancakes, you should always throw the first one away. I will post it if you want, but it is not done.
OK by me. But uncompressed zip archives work, and may even be faster than conpressed archives. JimA

Good point. Uncompressed archives will definitely be faster. I'd be happy to legislate uncompressed archives, except I'm worried that not all tools commonly used to create zip archives make it easy to turn off compression. --Guido van Rossum (home page: http://www.python.org/~guido/)

"James C. Ahlstrom" wrote:
If you don't include the *.py file in the archive, chances are high that tracebacks will no longer print out as they do today (another problem here is that the filename being used in the code object will probably not be found... not sure whether we can fix this one). By only looking at the .pyc or .pyo you'll also introduce a Python version problem into the ZIP-archive: the magic in these files changs rather frequently and e.g. a more recent release of Python won't be able to load these from a ZIP-archive which only contains .pycs from a compile by a less recent version.
That's true :-) Still, I'd like a chance to at least look at the impact this has on import.c and of course play with it so that I can test it in everyday situations.
Even if uncompressed archive would work faster, the compiled libz is only 70k on Linux and I think we could solve a great number of (zlib version) problems by including zlib in Python. It's one of those basic tools you need frequently, but is even more frequently not configured as Python module :-( Since it is already included in the Windows builds, I guess adding it to core for Unix and Macs too wouldn't hurt all that much. It would also save you the trouble of maintaining the code for scanning uncompressed zip-archives in your ZIP import code, so we win on two counts :-) -- Marc-Andre Lemburg CEO eGenix.com Software GmbH ______________________________________________________________________ Consulting & Company: http://www.egenix.com/ Python Software: http://www.lemburg.com/python/

Jim Ahlstrom wrote: [From PEP 273]
Currently, sys.path is a list of directory names as strings.
A nit: that's in practice, but not by definition (a couple std modules have been patched to allow other things on sys.path). After extensive use of imputil (which puts objects on sys.path), I think we might as well make it official that sys.path is a list of strings. [Subdirectory Equivalence] This is a bit misleading - it can be read as implying that the importable modules are in a flat namespace. They're not. To get R.Q.modfoo imported, R.__init__ and R.Q.__init__ must be imported. The __init__ modules have the opportunity to play with __path__ to break the equivalence of R.Q.modfoo to R/Q/modfoo.py. Or (more likely), play games with attributes so that Q is, say, an instance, or maybe a module imported from someplace else. Question: if Archive.zip/Q/__init__.pyc does "__path__.append('some/real/directory')", will it work? [dynamic libs]
It might be possible to extract the dynamic module from the zip file, write it to a plain file and load it.
I think you can nail the door shut on this one. On many OSes, making dynamic libs available to a process requires settings that can only (sanely) be made before the process starts. OTOH, it's common practice these days to put dynamic libs inside packages. That needs to be dealt with at runtime, and at build time (since it breaks the expectation that you can just "zip up" the package). [no *.py files]
You also can't import source files *.py from a zip archive.
Apparently some Linuxes / RPM distributions don't deliver .pyc's or .pyo's. Since they're installed as root and run as some-poor-user, I'm afraid there are quite a few installations running off .py's all the time. So while it's definitely sub- optimal, I'm not sure it should be outlawed. [Guido]
But it would still be good if the .py files were also in the zip file, so the source can be used in tracebacks etc.
As you probably remember me saying before, I think this is a very minor nice-to-have. For one thing, you've probably done enough testing before stuffing your code into an archive that you're past the point of looking at a traceback and going "doh!". Most of the time you'll need to look at the code anyway. OTOH, this [more from Guido]:
A C API to get a source line from a filename might be a good idea (plus a Python API).
points the way towards something I'm very much in favor of: deferring things to that-which-did-the-importing. [Efficiency]
The key is the archive name from sys.path joined with the file name (including any subdirectories) within the archive.
DIfferent spellings of the same path are possible in a filesystem, but not in a dictionary. A bit of "harmless" tweaking of sys.path could render an archive unreachable. [zlib must be available at start] I'll agree, and agree with Guido that the coolest thing would be to make zlib standard. [Booting - python.zip should be part of the generated sys.path] Agree. Nice and straightforward. Now, from the discussion: [restate Jim] sys.path contains /C/D/E/Archive.zip and Archive.zip contains "Q/R/modfoo.pyc" so "import Q.R.modfoo" is satisfied by /C/D/E/Archive.zip/Q/R/modfoo.pyc [restate Finn] Jython has sys.path /C/D/E/Archive.zip!Lib and Archive.zip has "Lib/Q/R/modfoo.pyc" so "import Q.R.modfoo" is satisfied by /C/D/E/Archive.zip/Lib/Q/R/modfoo.pyc [restate Guido] Why not use /C/D/E/Archive.zip/Lib on sys.path? I use embedded archives. So sys.path will have an entry like: "/path/to/executable?84758" which says that the archive starts at position 84758 in the file "executable". Anything beyond the simple approach Jim takes gets into some very URL-ish territory. That's fine by me :-). I don't really like the idea of hacking special knowledge of zip files into import.c (which is already a specialist in filesystems). Like Finn said, if this is a deployment issue (we want zip files now, and are willing to live with strict limitations / rules to get it), then OK (as long as it supports __path__ and some way of dealing with dynamic libs in packages). Personally, I think package support stretched import.c to it's monolithic limits and it's high time the code was refactored to make it sanely extensible. - Gordon
participants (13)
-
barry@zope.com
-
bckfnn@worldonline.dk
-
Fredrik Lundh
-
Gordon McMillan
-
Greg Ward
-
Guido van Rossum
-
James C. Ahlstrom
-
Jeremy Hylton
-
Just van Rossum
-
M.-A. Lemburg
-
Mark Hammond
-
Skip Montanaro
-
Tim Peters