[ taking the liberty to CC: this back to python-dev ]
On Fri, 19 Nov 1999, David Ascher wrote:
(2) a file in a directory that's on sys.path can be a zip/jar file; its contents will be considered as a package (note that this is different from (1)!)
No problem. This will slow things down, as a stat() for *.zip and/or *.jar must be done, in addition to *.py, *.pyc, and *.pyo.
Aside: it strikes me that for Python programs which import lots of files, 'front-loading' the stat calls could make sense. When you first look at a directory in sys.path, you read the entire directory in memory, and successive imports do a stat on the directory to see if it's changed, and if not use the in-memory data. Or am I completely off my rocker here?
Not at all. I thought of this last night after my email. Since the Importer can easily retain state, it can hold a cache of the directory listings. If it doesn't find the file in its cached state, then it can reload the information from disk. If it finds it in the cache, but not on disk, then it can remove the item from its cache.
The problem occurs when you path is [A, B], the file is in B, and you add something to A on-the-fly. The cache might direct the importer at B, missing your file.
Of course, with the appropriate caveats/warnings, the system would work quite well. It really only breaks during development (which is one reason why I didn't accept some caching changes to imputil from MAL; but that was for the Importer in there; Python's new Importer could have a cache).
I'm also not quite sure what the cost of reading a directory is, compared to issuing a bunch of stat() calls. Each directory read is an opendir/readdir(s)/closedir. Note that the DBM approach is kind of similar, but will amortize this cost over many processes.