Fred L. Drake wrote:
M.-A. Lemburg writes:
Well, I was referring to the additional lookup needed to find the next package dir of the same name. Say you put the Python package into site-packages and the binaries into plat-<platform>.
I didn't say it was free; just that the cost was insignificant compared to the current cost.
My sys.path in an interactive interpreter contains 11 entries. If I want to add a package with both $prefix and $exec_prefix components, the worst case is that the directory holding the __init__.py* is the last path entry, and the other directory is in the immediately preceeding path entry. After the current mechanism locates the __init__.py* file, it needs to build the __path__ for the package. It takes 10 stat() calls to locate the additional directory. Considering that the initial search that caused the package module to be created took: 11 stats to see if the entries contained the appropriate directory + 2 stats to determine that the first directory of the package (the one that doesn't have __init__.py*) wasn't it + 36 to determine that the first 9 directories didn't contain a matching .so|module.so|.py|.py[co]. Plus at least one to actually find the __init__.pyc; two if only the .py is available. (I think I followed the code right. ;) That's 59 system calls (either stat() or open(), the later hidden inside fdopen()). I don't the added 10 to get the right __path__ is worth worrying about.
Wow, what an analysis.
It's the .py[co] files that are expensive to load! Once you've created the package, sub-modules are very cheap: you will typically have no more than two path entries to check even once all this is in place.
I'm not sure I follow you here: do you mean with a package dir cache in place or using the system implemented in the current release ?
caching; loading Grail is still dog slow, and I've no doubt that the 600+ stat() calls contribute to that! 1-)
Oops, after following through with the math, I'd have to adjust this to 6000 stat()/open() calls for Grail. Sorry!
This seems like something to worry about and probably also enough to try really hard to find a good solution, IMHO.
And back to Marc-Andre:
I would very much like to see some sort of caching in the interpreter. The fastpath hook I implemented uses a marshalled dict stored in the user's home dir for the lookup. Once created,
I don't think I'd store the cache; if a user's home directory is mounted via NFS (common), then it may often be wrong if the user actively works with a variety of hosts with different versions or installations of Python.
True, that's why the hook allows you to code the strategy in Python. Note that my current version uses the sys.path as key into a table of name:file mappings, so even when using different setups (which will certainly have some differences in sys.path), the cache should work. Maybe one should add some more information to the key... like the platform specifica or the even the mtimes of the directories on the path.
The benefits of a cache are greatest for applications that import a lot of modules (like Grail!); the cache can be built using a directory scan as each directory is searched. (I think one of the guys from CWI did this at one point and had really good results; Jack?)
Yep, remember that too. The problem with these scans is that directories may contain huge amounts of files and you would need to check all of them against the module extensions Python uses.
Anyway, the dynamic and static versions are both implementable using the hook, so I'd opt for going into that direction rather than hard-wiring some logic into the interpreters core.