[Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)
P.J. Eby
pje at telecommunity.com
Wed Jul 29 04:21:47 CEST 2009
At 01:02 PM 7/29/2009 +1200, Greg Ewing wrote:
>P.J. Eby wrote:
>
>>So the optimum performance tradeoff depends on how many imports you
>>have *and* how many eggs you have on sys.path. If you have lots of
>>eggs and few imports, unzipped ones will probably be faster. If
>>you have lots of eggs and *lots* of imports, zipped ones will
>>probably be faster.
>
>I'm wondering whether something could be gained by
>cacheing the results of sys.path lookups somehow
>between interpreter invocations.
>
>Most of the time the contents of the directories
>on one's PYTHONPATH don't change, so doing all this
>statting and directory reading every time an
>interpreter starts up seems rather suboptimal.
The catch is that then you need some way to know whether your cache
information is wrong/out-of-date. I suppose, though, that you could
do something like make a file that contains stat times, such that
modifying the contained directory would automatically invalidate the
cache info.
However, you'd probably gain more by making the core import logic
simply use the dircache module (or a C equivalent thereof) in place
of stat() calls. This would drop the per-import stat() count for
each directory to 1 (in place of several for .py, .pyc, .pyd/.so,
/__init__.py, etc.), at the cost of an initial listdir() call the
first time a directory is used. This would give normal imports most
of the speedup benefit that e.g. putting the stdlib in a zipfile does.
More information about the Distutils-SIG
mailing list