
On Mon, 6 Dec 1999, James C. Ahlstrom wrote:
Greg Stein wrote: ...
I am not following this. What/where is the "single dictionary of module names" ? Are you referring to a cache? Or is this about building an archive?
An archive would look just like we have now: map a name to a module. It would not need multiple dictionaries.
The "single dictionary of names" is in the single archive importer instance and has nothing to do with creating the archive. It is currently programmed this way.
Ah. There is the problem. In Guido's suggestion for the "next path of inquiry" :-), there is no "single dictionary of names". Instead, you have Importer instances as items in sys.path. Each instance maintains its dictionary, and they are not (necessarily) combined. If we were to combine them, then we would need to maintain the ordering requirements implied by sys.path. However, this would be problematic if sys.path changed -- we would have to detect the situation and rebuild a merged dict.
Suppose the user specifies by name 12 archive files to be searched. That is, the user hacks site.py to add archive names to the importer. The "single dictionary" means that the archive importer takes the 12 dictionaries in the 12 files and merges them together into one dictionary in order to speed up the search for a name. The good news is you can always just call the archive importer to get a module. The bad news is you can't do that for each entry on sys.path because there is no necessary identity between archive files and sys.path. The user specified the archive files by name, and they may or may not be on sys.path, and the user may or may not have specified them in the same order as sys.path even if they are.
The importer must be inserted into sys.path to establish a precedence. If the user wants to add 12 libraries... fine. But *all* of those modules will fall under a precedence defined by the Importer's position on sys.path.
Suppose archive files must lie on sys.path and are processed in order. Then to find them you must know their name. But IMHO you want to avoid doing a readdir() on each element of sys.path and looking for files *.pyl.
I do not believe that we will arbitrarily locate and open library files. They must be specified explicitly.
Suppose archive file names in general are the known name "lib.pyl" for the Python library, plus the names "package.pyl" where "package" can be the name of a Python package as a single archive file. Then if the user tries to import foo, imputil will search along sys.path looking for foo.pyc, foo.pyl, etc. If it finds foo.pyl, the archive importer will add it to its list of known archive files. But it must not add it to its single dictionary, because that would destroy the information about its position along sys.path. Instead, it must keep a separate dictionary for each element of sys.path and search the separate dictionaries under control of imputil. That is, get_code() needs a new argument for the element of sys.path being searched. Alternatively, you could create a new importer instance for each archive file found, but then you still have multiple dictionaries. They are in the multiple instances.
If the user installs ".pyl" as a recognized extension (i.e. installs into the PathImporter), then the above scenario is possible. In my in-head-design, I had not imagined any state being retained for extension-recognizer hooks. Of course, state can be retained simply by using a bound-method for the hook function. get_code() would not need to change. The foo.pyl would be consulted at the appropriate time based on where it is found in sys.path. Note that file- extension hooks would definitely have a complete path to the target file. Those are not Importers, however (although they will closely follow the get_code() hook since the extension is called from get_code).