[Python-Dev] Import redesign [LONG]

James C. Ahlstrom jim@interet.com
Sat, 04 Dec 1999 14:51:47 -0500


Greg Stein wrote:
> 
> On Fri, 3 Dec 1999, Guido van Rossum wrote:

> > attack.  In particular, installing a new package (e.g. PIL) should
> > affect sys.path, regardless of the way of delivery of the modules
> > (shared libs, .py files, .pyc files, or a zip archive).

> To be explicit/clear and to be sure I'm hearing you right: sys.path may
> contain Importer instances. Given the name FOO, the system will step
> through sys.path looking for the first occurence of FOO (looking in a
> directory or delegating). FOO may be found with any number of
> (configurable) file extensions, which are ordered (e.g. ".so" before
> ".py" before ".isl").

This is basically a gripe about this design spec.  So if the answer
turns out to be "we need this functionality so shut up" then just
say that and don't flame me.

This spec is painful.  Suppose sys.path has 10 elements, and there
are six file extensions.  Then the simple algorithm is slow:
  for path in sys.path:		# Yikes, may not be a string!
    for ext in file_extensions:
      name = "%s.%s" % (module_name, ext)
      full_path = os.path.join(path, name)
      if os.path.isfile(full_path):
        # Process file here

And sys.path can contain class instances
which only makes things slower.  You could do a readdir() and cache
the results, but maybe that would be slower.  A better
algorithm might be faster, but a lot more complicated.

In the context of archive files, it is also painful.  It prevents
you from saving a single dictionary of module names.  Instead you
must have len(sys.path) dictionaries.  You could try to
save in the archive information about whether (say) a foo.dll was
present in the file system, but the list of extensions is extensible.

The above problem only exists to support equally-named modules; that
is, to support a run-time choice of whether to load foo.pyc, foo.dll,
foo.isl, etc.  I claim (without having written it) that the fastest
algorithm to solve the unique-name case is much faster than the fastest
algorithm to solve the choose-among-equal-names case.

Do we really need to support the equal-name case [Jim runs for
cover...]?
If so, how about inventing a new way to support it.  Maybe if equal
names exist, these must be pre-loaded from a known location?

JimA