[Python-Dev] Import redesign [LONG]

James C. Ahlstrom jim@interet.com
Mon, 06 Dec 1999 13:34:41 -0500


Greg Stein wrote:
> 
> On Sat, 4 Dec 1999, James C. Ahlstrom wrote:
> >         # Process file here
> 
> This is the algorithm that Python uses today, and my standard Importers
> follow.

Agreed.
 
> > And sys.path can contain class instances
> > which only makes things slower.
> 
> IMO, we don't know this, or whether it is significant.

Agreed.
 
> > You could do a readdir() and cache
> > the results, but maybe that would be slower.  A better
> > algorithm might be faster, but a lot more complicated.
> 
> Who knows. BUT: the import process is now in Python -- it makes it *much*
> easier to run these experiments. We could not really do this when the
> import process is "hard-coded" in C code.

Agreed.
 
> > In the context of archive files, it is also painful.  It prevents
> > you from saving a single dictionary of module names.  Instead you
> > must have len(sys.path) dictionaries.  You could try to
> > save in the archive information about whether (say) a foo.dll was
> > present in the file system, but the list of extensions is extensible.
> 
> I am not following this. What/where is the "single dictionary of module
> names" ? Are you referring to a cache? Or is this about building an
> archive?
> 
> An archive would look just like we have now: map a name to a module. It
> would not need multiple dictionaries.

The "single dictionary of names" is in the single archive importer
instance and has nothing to do with creating the archive.  It
is currently programmed this way.

Suppose the user specifies by name 12 archive files to be searched.
That is, the user hacks site.py to add archive names to the importer.
The "single dictionary" means that the archive importer takes the 12
dictionaries in the 12 files and merges them together into one
dictionary
in order to speed up the search for a name.  The good news is you can
always just call the archive importer to get a module.  The bad news is
you can't do that for each entry on sys.path because there is no
necessary identity between archive files and sys.path.  The user
specified the archive files by name, and they may or may not be on
sys.path, and the user may or may not have specified them in the
same order as sys.path even if they are.

Suppose archive files must lie on sys.path and are processed in order.
Then to find them you must know their name.  But IMHO you want to
avoid doing a readdir() on each element of sys.path and looking for
files *.pyl.

Suppose archive file names in general are the known name "lib.pyl"
for the Python library, plus the names "package.pyl" where "package"
can be the name of a Python package as a single archive file.  Then
if the user tries to import foo, imputil will search along sys.path
looking for foo.pyc, foo.pyl, etc.  If it finds foo.pyl, the archive
importer will add it to its list of known archive files.  But it must
not add it to its single dictionary, because that would destroy the
information about its position along sys.path.  Instead, it must keep
a separate dictionary for each element of sys.path and search the
separate dictionaries under control of imputil.  That is, get_code()
needs a new argument for the element of sys.path being searched.
Alternatively, you could create a new importer instance for each
archive file found, but then you still have multiple dictionaries.
They are in the multiple instances.

All this is needed only to support import of identically named
modules.  If there are none, there is no problem because sys.path
is being used only to find modules, not to disambiguate them.

See also my separate reply to your other post which discusses
this same issue.

JimA