Cacheing sys.path rather than building it each time?

Sat Apr 8 13:20:00 EDT 2000

Hamish Lawson wrote:

> Steve Holden wrote:
> 
> > Probably not an awfully good idea, given that "installation" of
> > a new module often consists of just dropping a ".py" file into
> > a library directory.  Without an common "install" process the
> > work of checking whether the cahced path is up to date would
> > be almost as much as that of building the path from scratch
> > as is now done.
> 
> Had you thought I was proposing that such a check be done
> whenever Python was started? I'd agree that the work of checking
> whether the cached path is up to date would be almost as much
> work as building the path. However that wasn't part of my
> proposal.
> 
> Instead when a module is installed, the pathbuilder tool would be
> run either by the module's install program (perhaps via the
> distutils) or manually by the user. (You could even run it at
> scheduled intervals if you thought that necessary to catch
> omissions.) My goal is to avoid having the path built every time
> Python starts.
> 
> A command-line option for the Python interpreter could be used to
> determine whether it uses the cached path or builds the path from
> scratch; the decision as to which should be the default I'll
> leave to others.

There are 3 pieces to the puzzle.

1) How do you find those things that Python requires 
(bootstrap).
2) How do you build the list of places to look for Python 
modules.
3) What is the complete universe of importable Python 
modules.

You are talking about #3, but the original question seems to 
have been about #2.

Caching #3 would involve rewriting the entire import 
mechanism. It would yield enormous runtime speedups, since 
an import would take 1 I/O (vs something along the lines of 4 * 
len(sys.path)/2). This might be very cool in a CGI type 
situation, but it's very unlikely to ever become standard 
Python because it violates the "no surprises" rule. There are 
also other, simpler ways of speeding up imports (eg, archives).

But the startup overhead is mostly in #1. There's a lot of code 
in getpath.c dealing with things like developer builds, strange 
installations and other sys admin hacks. For a general 
purpose Python installation, this code needs to be there (well, 
it needs to be someplace, not necesarily in getpath.c); but for 
a special purpose Python, it's relatively easy to hack 
getpath.c to your needs.

Once you've done that, #2 is really a non-issue. Consolidate 
your .pth files and /or fine tune sitecustomize.py and sys.path 
will get filled out with minimal I/Os.

- Gordon