[Distutils] Cache PYTHONPATH? (Re: make unzipped eggs be the default)

P.J. Eby pje at telecommunity.com
Wed Jul 29 04:21:47 CEST 2009


At 01:02 PM 7/29/2009 +1200, Greg Ewing wrote:
>P.J. Eby wrote:
>
>>So the optimum performance tradeoff depends on how many imports you 
>>have *and* how many eggs you have on sys.path.  If you have lots of 
>>eggs and few imports, unzipped ones will probably be faster.  If 
>>you have lots of eggs and *lots* of imports, zipped ones will 
>>probably be faster.
>
>I'm wondering whether something could be gained by
>cacheing the results of sys.path lookups somehow
>between interpreter invocations.
>
>Most of the time the contents of the directories
>on one's PYTHONPATH don't change, so doing all this
>statting and directory reading every time an
>interpreter starts up seems rather suboptimal.

The catch is that then you need some way to know whether your cache 
information is wrong/out-of-date.  I suppose, though, that you could 
do something like make a file that contains stat times, such that 
modifying the contained directory would automatically invalidate the 
cache info.

However, you'd probably gain more by making the core import logic 
simply use the dircache module (or a C equivalent thereof) in place 
of stat() calls.  This would drop the per-import stat() count for 
each directory to 1 (in place of several for .py, .pyc, .pyd/.so, 
/__init__.py, etc.), at the cost of an initial listdir() call the 
first time a directory is used.  This would give normal imports most 
of the speedup benefit that e.g. putting the stdlib in a zipfile does.



More information about the Distutils-SIG mailing list