Re: [Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)
At 06:52 PM 8/2/2009 +0200, Tarek Ziadé wrote:
On Wed, Jul 29, 2009 at 6:44 AM, P.J. Eby
wrote: At 10:35 PM 7/28/2009 -0500, Ian Bicking wrote:
On Tue, Jul 28, 2009 at 9:40 PM, P.J. Eby
wrote: At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:
I can see how this could go quite wrong, but maybe if installers touch some file in the library directory anytime a package is installed/reinstalled/removed/etc,
You mean, like, the mtime of the directory itself? Â ;-)
Do directory mtimes get recursively updated? I don't think they do.
That's not necessary; if imports use a cached listdir, then the children will get handled recursively.
So if you have a layout:
site-packages/ zope/ interface/ __init__.py
And you update the package and update __init__.py, the mtime of site-packages doesn't change, does it?
Nope, but at the top level, the fact that 'zope' is present is unchanged, as is the presence of an 'interface' subdirectory.
I'm saying if there was a file in site-packages/last_updated that gets touched everytime an installer does anything in site-packages, then you could cache (between processes) the lookups.
Since each invocation of the interpreter can have a different PYTHONPATH, the cache has to be per-directory, not global. If it's per-directory, then there's no real benefit over runtime caching, since you now have to open and read a file (instead of just reading the directory). And as I said, it's not realistic to think that opening and reading a file is going to beat opening and reading a directory for speed.
But opening and reading one file should beat opening hundreds of directories : In the PEP 376 prototype, after thinking about a per-directory cache like you are describing, I was thinking about having a global index file to replace the global dictionnary that keeps track of the distributions per directory (currently the directory path is the key in the dictionnary and the value the distribution objects).
That can even be a simple shelve of the dictionary, that become a global index of directories that [are/were once] in the path. This works as long as the index file is per-user. Or even better : per-application. I don't know how this could be managed/done, but a simple cache file created alongside the script the application is launched with, could speed up the lookups at the second launch.
You'd still have to stat the directories to know if they changed - in which case the logic I've already laid out still applies. I think, however, we are discussing different nominal scenarios. I'm assuming a post-PEP 376 world where the only use for .egg files or directories are for *non-default* versions of packages, that only get added to sys.path for apps or libraries that need them, rather than being in a default .pth file. However, if you're discussing speeding up an environment where we use .egg directories and they're on sys.path, then a per-user global cache might speed things up. For security reasons, however, that cache would need to be ignored by Python when running secure scripts. (e.g. -s and -E options, and definitely anything setuid.) In contrast, directory stat caching with a modest number of (non-egg) PYTHONPATH entries would speed things nicely in the hopefully-future-default case.
participants (1)
-
P.J. Eby