[Python-Dev] Import redesign (was: Python 1.6 status)

Nov. 18, 1999

      Gordon McMillan wrote:
...
Marc-Andre wrote:
...
Fredrik Lundh wrote:
...
Guido van Rossum <guido@CNRI.Reston.VA.US> wrote:
...
- suggestions for new issues that maybe ought to be settled in 1.6
three things: imputil, imputil, imputil
But please don't add the current version as default importer...
its strategy is way too slow for real life apps (yes, I've tested
this: imports typically take twice as long as with the builtin
importer).
I think imputil's emulation of the builtin importer is more of a 
demonstration than a serious implementation. As for speed, it 
depends on the test.
Agreed.  I like some of imputil's features, but I think the API
need to be redesigned.
...
...
I'd opt for an import manager which provides a useful API for
import hooks to register themselves with.
I think that rather than blindly chain themselves together, there 
should be a simple minded manager. This could let the 
programmer prioritize them.
Indeed.  (A list of importers has been suggested, to replace the list
of directories currently used.)
...
...
What we really need
is not yet another complete reimplementation of what the
builtin importer does, but rather a more detailed exposure of
the various import aspects: finding modules and loading modules.
The first clause I sort of agree with - the current 
implementation is a fine implementation of a filesystem 
directory based importer.
I strongly disagree with the second clause. The current import 
hooks are just such a detailed exposure; and they are 
incomprehensible and unmanagable.
Based on how many people have successfully written import hooks, I
have to agree. :-(
...
I guess you want to tweak the "finding" part of the builtin 
import mechanism. But that's no reason to ask all importers 
to break themselves up into "find" and "load" pieces. It's a 
reason to ask that the standard importer be, in some sense, 
"subclassable" (ie, expose hooks, or perhaps be an extension 
class like thingie).
Agreed.  Subclassing is a good way towards flexibility.

And Jim Ahlstrom writes:
...
IMHO the current import mechanism is good for developers who must
work on the library code in the directory tree, but a disaster
for sysadmins who must distribute Python applications either
internally to a number of machines or commercially.
Unfortunately, you're right. :-(
...
What we need is a standard Python library file like a Java "Jar"
file.  Imputil can support this as 130 lines of Python.  I have also
written one in C.  I like the imputil approach, but if we want to
add a library importer to import.c, I volunteer to write it.
Please volunteer to design or at least review the grand architecture
-- see below.
...
I don't want to just add more complicated and unmanageable hooks
which people will all use different ways and just add to the
confusion.
You're so right!
...
It is easy to install packages by just making them into a library
file and throwing it into a directory.  So why aren't we doing it?
Rhetorical question. :-)

So here's a challenge: redesign the import API from scratch.

Let me start with some requirements.

Compatibility issues:
---------------------

- the core API may be incompatible, as long as compatibility layers
can be provided in pure Python

- support for rexec functionality

- support for freeze functionality

- load .py/.pyc/.pyo files and shared libraries from files

- support for packages

- sys.path and sys.modules should still exist; sys.path might
have a slightly different meaning

- $PYTHONPATH and $PYTHONHOME should still be supported

(I wouldn't mind a splitting up of importdl.c into several
platform-specific files, one of which is chosen by the configure
script; but that's a bit of a separate issue.)

New features:
-------------

- Integrated support for Greg Ward's distribution utilities (i.e. a
  module prepared by the distutil tools should install painlessly)

- Good support for prospective authors of "all-in-one" packaging tool
  authors like Gordon McMillan's win32 installer or /F's squish.  (But
  I *don't* require backwards compatibility for existing tools.)

- Standard import from zip or jar files, in two ways:

  (1) an entry on sys.path can be a zip/jar file instead of a directory;
      its contents will be searched for modules or packages

  (2) a file in a directory that's on sys.path can be a zip/jar file;
      its contents will be considered as a package (note that this is
      different from (1)!)

  I don't particularly care about supporting all zip compression
  schemes; if Java gets away with only supporting gzip compression
  in jar files, so can we.

- Easy ways to subclass or augment the import mechanism along
  different dimensions.  For example, while none of the following
  features should be part of the core implementation, it should be
  easy to add any or all:

  - support for a new compression scheme to the zip importer

  - support for a new archive format, e.g. tar

  - a hook to import from URLs or other data sources (e.g. a
    "module server" imported in CORBA) (this needn't be supported
    through $PYTHONPATH though)

  - a hook that imports from compressed .py or .pyc/.pyo files

  - a hook to auto-generate .py files from other filename
    extensions (as currently implemented by ILU)

  - a cache for file locations in directories/archives, to improve
    startup time

  - a completely different source of imported modules, e.g. for an
    embedded system or PalmOS (which has no traditional filesystem)

- Note that different kinds of hooks should (ideally, and within
  reason) properly combine, as follows: if I write a hook to recognize
  .spam files and automatically translate them into .py files, and you
  write a hook to support a new archive format, then if both hooks are
  installed together, it should be possible to find a .spam file in an
  archive and do the right thing, without any extra action.  Right?

- It should be possible to write hooks in C/C++ as well as Python

- Applications embedding Python may supply their own implementations,
  default search path, etc., but don't have to if they want to piggyback
  on an existing Python installation (even though the latter is
  fraught with risk, it's cheaper and easier to understand).

Implementation:
---------------

- There must clearly be some code in C that can import certain
  essential modules (to solve the chicken-or-egg problem), but I don't
  mind if the majority of the implementation is written in Python.
  Using Python makes it easy to subclass.

- In order to support importing from zip/jar files using compression,
  we'd at least need the zlib extension module and hence libz itself,
  which may not be available everywhere.

- I suppose that the bootstrap is solved using a mechanism very
  similar to what freeze currently used (other solutions seem to be
  platform dependent).

- I also want to still support importing *everything* from the
  filesystem, if only for development.  (It's hard enough to deal with
  the fact that exceptions.py is needed during Py_Initialize();
  I want to be able to hack on the import code written in Python
  without having to rebuild the executable all the time.

Let's first complete the requirements gathering.  Are these
requirements reasonable?  Will they make an implementation too
complex?  Am I missing anything?

Finally, to what extent does this impact the desire for dealing
differently with the Python bytecode compiler (e.g. supporting
optimizers written in Python)?  And does it affect the desire to
implement the read-eval-print loop (the >>> prompt) in Python?

--Guido van Rossum (home page: http://www.python.org/~guido/)