
Greg Stein wrote:
On Sat, 4 Dec 1999, James C. Ahlstrom wrote:
# Process file here
This is the algorithm that Python uses today, and my standard Importers follow.
Agreed.
And sys.path can contain class instances which only makes things slower.
IMO, we don't know this, or whether it is significant.
Agreed.
You could do a readdir() and cache the results, but maybe that would be slower. A better algorithm might be faster, but a lot more complicated.
Who knows. BUT: the import process is now in Python -- it makes it *much* easier to run these experiments. We could not really do this when the import process is "hard-coded" in C code.
Agreed.
In the context of archive files, it is also painful. It prevents you from saving a single dictionary of module names. Instead you must have len(sys.path) dictionaries. You could try to save in the archive information about whether (say) a foo.dll was present in the file system, but the list of extensions is extensible.
I am not following this. What/where is the "single dictionary of module names" ? Are you referring to a cache? Or is this about building an archive?
An archive would look just like we have now: map a name to a module. It would not need multiple dictionaries.
The "single dictionary of names" is in the single archive importer instance and has nothing to do with creating the archive. It is currently programmed this way. Suppose the user specifies by name 12 archive files to be searched. That is, the user hacks site.py to add archive names to the importer. The "single dictionary" means that the archive importer takes the 12 dictionaries in the 12 files and merges them together into one dictionary in order to speed up the search for a name. The good news is you can always just call the archive importer to get a module. The bad news is you can't do that for each entry on sys.path because there is no necessary identity between archive files and sys.path. The user specified the archive files by name, and they may or may not be on sys.path, and the user may or may not have specified them in the same order as sys.path even if they are. Suppose archive files must lie on sys.path and are processed in order. Then to find them you must know their name. But IMHO you want to avoid doing a readdir() on each element of sys.path and looking for files *.pyl. Suppose archive file names in general are the known name "lib.pyl" for the Python library, plus the names "package.pyl" where "package" can be the name of a Python package as a single archive file. Then if the user tries to import foo, imputil will search along sys.path looking for foo.pyc, foo.pyl, etc. If it finds foo.pyl, the archive importer will add it to its list of known archive files. But it must not add it to its single dictionary, because that would destroy the information about its position along sys.path. Instead, it must keep a separate dictionary for each element of sys.path and search the separate dictionaries under control of imputil. That is, get_code() needs a new argument for the element of sys.path being searched. Alternatively, you could create a new importer instance for each archive file found, but then you still have multiple dictionaries. They are in the multiple instances. All this is needed only to support import of identically named modules. If there are none, there is no problem because sys.path is being used only to find modules, not to disambiguate them. See also my separate reply to your other post which discusses this same issue. JimA