Re: [Python-Dev] Import redesign [LONG]

Dec. 6, 1999

      Greg Stein wrote:
...
On Sat, 4 Dec 1999, James C. Ahlstrom wrote:
...
# Process file here
This is the algorithm that Python uses today, and my standard Importers
follow.
Agreed.
...
...
And sys.path can contain class instances
which only makes things slower.
IMO, we don't know this, or whether it is significant.
Agreed.
...
...
You could do a readdir() and cache
the results, but maybe that would be slower.  A better
algorithm might be faster, but a lot more complicated.
Who knows. BUT: the import process is now in Python -- it makes it *much*
easier to run these experiments. We could not really do this when the
import process is "hard-coded" in C code.
Agreed.
...
...
In the context of archive files, it is also painful.  It prevents
you from saving a single dictionary of module names.  Instead you
must have len(sys.path) dictionaries.  You could try to
save in the archive information about whether (say) a foo.dll was
present in the file system, but the list of extensions is extensible.
I am not following this. What/where is the "single dictionary of module
names" ? Are you referring to a cache? Or is this about building an
archive?
An archive would look just like we have now: map a name to a module. It
would not need multiple dictionaries.
The "single dictionary of names" is in the single archive importer
instance and has nothing to do with creating the archive.  It
is currently programmed this way.

Suppose the user specifies by name 12 archive files to be searched.
That is, the user hacks site.py to add archive names to the importer.
The "single dictionary" means that the archive importer takes the 12
dictionaries in the 12 files and merges them together into one
dictionary
in order to speed up the search for a name.  The good news is you can
always just call the archive importer to get a module.  The bad news is
you can't do that for each entry on sys.path because there is no
necessary identity between archive files and sys.path.  The user
specified the archive files by name, and they may or may not be on
sys.path, and the user may or may not have specified them in the
same order as sys.path even if they are.

Suppose archive files must lie on sys.path and are processed in order.
Then to find them you must know their name.  But IMHO you want to
avoid doing a readdir() on each element of sys.path and looking for
files *.pyl.

Suppose archive file names in general are the known name "lib.pyl"
for the Python library, plus the names "package.pyl" where "package"
can be the name of a Python package as a single archive file.  Then
if the user tries to import foo, imputil will search along sys.path
looking for foo.pyc, foo.pyl, etc.  If it finds foo.pyl, the archive
importer will add it to its list of known archive files.  But it must
not add it to its single dictionary, because that would destroy the
information about its position along sys.path.  Instead, it must keep
a separate dictionary for each element of sys.path and search the
separate dictionaries under control of imputil.  That is, get_code()
needs a new argument for the element of sys.path being searched.
Alternatively, you could create a new importer instance for each
archive file found, but then you still have multiple dictionaries.
They are in the multiple instances.

All this is needed only to support import of identically named
modules.  If there are none, there is no problem because sys.path
is being used only to find modules, not to disambiguate them.

See also my separate reply to your other post which discusses
this same issue.

JimA