[Python-checkins] CVS: python/nondist/peps pep-0273.txt,NONE,1.1

Barry Warsaw bwarsaw@users.sourceforge.net
Thu, 25 Oct 2001 08:58:31 -0700


Update of /cvsroot/python/python/nondist/peps
In directory usw-pr-cvs1:/tmp/cvs-serv6109

Added Files:
	pep-0273.txt 
Log Message:
PEP 273, Import Modules from Zip Archives, James C. Ahlstrom


--- NEW FILE: pep-0273.txt ---
PEP: 273
Title: Import Modules from Zip Archives
Version: $Revision: 1.1 $
Last-Modified: $Date: 2001/10/25 15:58:29 $
Author: jim@interet.com (James C. Ahlstrom)
Status: Draft
Type: Standards Track
Created: 11-Oct-2001
Post-History:
Python-Version: 2.3


Abstract
    This PEP adds the ability to import compiled Python modules
    *.py[co] and packages from zip archives.


Specification

    Currently, sys.path is a list of directory names as strings.  If
    this PEP is implemented, an item of sys.path can be a string
    naming a zip file archive.  The zip archive can contain a
    subdirectory structure to support package imports.  The zip
    archive satisfies imports exactly as a subdirectory would.

    The implementation is in C code in the Python core and works on
    all supported Python platforms.

    Any files may be present in the zip archive, but only files *.pyc,
    *.pyo and __init__.py[co] are available for import.  Zip import of
    *.py and dynamic modules (*.pyd, *.so) is disallowed.

    Just as sys.path currently has default directory names, default
    zip archive names are added too.  Otherwise there is no way to
    import all Python library files from an archive.

    Reading compressed zip archives requires the zlib module.  An
    import of zlib will be attempted prior to any other imports.  If
    zlib is not available at that time, only uncompressed archives
    will be readable, even if zlib subsequently becomes available.


Subdirectory Equivalence

    The zip archive must be treated exactly as a subdirectory tree so
    we can support package imports based on current and future rules.
    Zip archive files must be created with relative path names.  That
    is, archive file names are of the form: file1, file2, dir1/file3,
    dir2/dir3/file4.

    Suppose sys.path contains "/A/B/SubDir" and "/C/D/E/Archive.zip",
    and we are trying to import modfoo from the Q package.  Then
    import.c will generate a list of paths and extensions and will
    look for the file.  The list of generated paths does not change
    for zip imports.  Suppose import.c generates the path
    "/A/B/SubDir/Q/R/modfoo.pyc".  Then it will also generate the path
    "/C/D/E/Archive.zip/Q/R/modfoo.pyc".  Finding the SubDir path is
    exactly equivalent to finding "Q/R/modfoo.pyc" in the archive.

    Suppose you zip up /A/B/SubDir/* and all its subdirectories.  Then
    your zip file will satisfy imports just as your subdirectory did.

    Well, not quite.  You can't satisfy dynamic modules from a zip
    file.  Dynamic modules have extensions like .dll, .pyd, and .so.
    They are operating system dependent, and probably can't be loaded
    except from a file.  It might be possible to extract the dynamic
    module from the zip file, write it to a plain file and load it.
    But that would mean creating temporary files, and dealing with all
    the dynload_*.c, and that's probably not a good idea.

    You also can't import source files *.py from a zip archive.  The
    problem here is what to do with the compiled files.  Python would
    normally write these to the same directory as *.py, but surely we
    don't want to write to the zip file.  We could write to the
    directory of the zip archive, but that would clutter it up, not
    good if it is /usr/bin for example.  We could just fail to write
    the compiled files, but that makes zip imports very slow, and the
    user would probably not figure out what is wrong.  It is probably
    best for users to put *.pyc into zip archives in the first place,
    and this PEP enforces that rule.

    So the only imports zip archives support are *.pyc and *.pyo, plus
    the import of __init__.py[co] for packages, and the search of the
    subdirectory structure for the same.


Efficiency

    The only way to find files in a zip archive is linear search.  So
    for each zip file in sys.path, we search for its names once, and
    put the names plus other relevant data into a static Python
    dictionary.  The key is the archive name from sys.path joined with
    the file name (including any subdirectories) within the archive.
    This is exactly the name generated by import.c, and makes lookup
    easy.


zlib

    Compressed zip archives require zlib for decompression.  Prior to
    any other imports, we attempt an import of zlib, and set a flag if
    it is available.  All compressed files are invisible unless this
    flag is true.

    It could happen that zlib was available later.  For example, the
    import of site.py might add the correct directory to sys.path so a
    dynamic load succeeds.  But compressed files will still be
    invisible.  It is unknown if it can happen that importing site.py
    can cause zlib to appear, so maybe we're worrying about nothing.
    On Windows and Linux, the early import of zlib succeeds without
    site.py.

    The problem here is the confusion caused by the reverse.  Either a
    zip file satisfies imports or it doesn't.  It is silly to say that
    site.py needs to be uncompressed, and that maybe imports will
    succeed later.  If you don't like this, create uncompressed zip
    archives or make sure zlib is available, for example, as a
    built-in module.  Or we can write special search logic during zip
    initialization.


Booting

    Python imports site.py itself, and this imports os, nt, ntpath,
    stat, and UserDict.  It also imports sitecustomize.py which may
    import more modules.  Zip imports must be available before site.py
    is imported.

    Just as there are default directories in sys.path, there must be
    one or more default zip archives too.

    The problem is what the name should be.  The name should be linked
    with the Python version, so the Python executable can correctly
    find its corresponding libraries even when there are multiple
    Python versions on the same machine.

    This PEP suggests a zip archive name equal to the Python
    interpreter path with extension ".zip" (eg, /usr/bin/python.zip)
    which is always prepended to sys.path.  So a directory with python
    and python.zip is complete.  This would work fine on Windows, as
    it is common to put supporting files in the directory of the
    executable.  But it may offend Unix fans, who dislike bin
    directories being used for libraries.  It might be fine to
    generate different defaults for Windows and Unix if necessary, but
    the code will be in C, and there is no sense getting complicated.


Implementation

    A C implementation exists which works, but which can be made better.


Copyright

    This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
fill-column: 70
End: