[Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)
P.J. Eby
pje at telecommunity.com
Sun Aug 2 19:54:18 CEST 2009
At 06:52 PM 8/2/2009 +0200, Tarek Ziadé wrote:
>On Wed, Jul 29, 2009 at 6:44 AM, P.J. Eby<pje at telecommunity.com> wrote:
> > At 10:35 PM 7/28/2009 -0500, Ian Bicking wrote:
> >>
> >> On Tue, Jul 28, 2009 at 9:40 PM, P.J. Eby<pje at telecommunity.com> wrote:
> >> > At 09:22 PM 7/28/2009 -0500, Ian Bicking wrote:
> >> >>
> >> >> I can see how this could go quite wrong, but maybe if installers touch
> >> >> some file in the library directory anytime a package is
> >> >> installed/reinstalled/removed/etc,
> >> >
> >> > You mean, like, the mtime of the directory itself? Â ;-)
> >>
> >> Do directory mtimes get recursively updated? I don't think they do.
> >
> > That's not necessary; if imports use a cached listdir, then the children
> > will get handled recursively.
> >
> >> So if you have a layout:
> >>
> >> site-packages/
> >> zope/
> >> interface/
> >> __init__.py
> >>
> >> And you update the package and update __init__.py, the mtime of
> >> site-packages doesn't change, does it?
> >
> > Nope, but at the top level, the fact that 'zope' is present is
> unchanged, as
> > is the presence of an 'interface' subdirectory.
> >
> >
> >> I'm saying if there was a file in site-packages/last_updated that gets
> >> touched everytime an installer does anything in site-packages, then
> >> you could cache (between processes) the lookups.
> >
> > Since each invocation of the interpreter can have a different PYTHONPATH,
> > the cache has to be per-directory, not global. If it's per-directory, then
> > there's no real benefit over runtime caching, since you now have
> to open and
> > read a file (instead of just reading the directory). And as I said, it's
> > not realistic to think that opening and reading a file is going to beat
> > opening and reading a directory for speed.
>
>But opening and reading one file should beat opening hundreds of directories :
>In the PEP 376 prototype, after thinking about a per-directory cache
>like you are
>describing, I was thinking about having a global index file to replace
>the global dictionnary that keeps track of the distributions per
>directory (currently the directory path
>is the key in the dictionnary and the value the distribution objects).
>
>That can even be a simple shelve of the dictionary, that become a
>global index of directories
>that [are/were once] in the path. This works as long as the index file
>is per-user.
>Or even better : per-application. I don't know how this could be
>managed/done, but
>a simple cache file created alongside the script the application is
>launched with, could
>speed up the lookups at the second launch.
You'd still have to stat the directories to know if they changed - in
which case the logic I've already laid out still applies.
I think, however, we are discussing different nominal scenarios. I'm
assuming a post-PEP 376 world where the only use for .egg files or
directories are for *non-default* versions of packages, that only get
added to sys.path for apps or libraries that need them, rather than
being in a default .pth file.
However, if you're discussing speeding up an environment where we use
.egg directories and they're on sys.path, then a per-user global
cache might speed things up. For security reasons, however, that
cache would need to be ignored by Python when running secure
scripts. (e.g. -s and -E options, and definitely anything setuid.)
In contrast, directory stat caching with a modest number of (non-egg)
PYTHONPATH entries would speed things nicely in the
hopefully-future-default case.
More information about the Distutils-SIG
mailing list