[Distutils] Experience of setuptools' cache design

Jim Fulton jim at zope.com
Fri Jan 18 19:31:30 CET 2008


On Jan 18, 2008, at 1:04 PM, Phillip J. Eby wrote:

> At 12:50 PM 1/18/2008 -0500, Jim Fulton wrote:
>
>> On Jan 18, 2008, at 12:32 PM, Phillip J. Eby wrote:
>>
>>> At 10:13 AM 1/18/2008 -0500, Jim Fulton wrote:
>> ...
>>>> I would want some way to prevent it, especially in a
>>>> production environment.
>>>
>>> The only way to absolutely prevent it is to install things
>>> unzipped.  And if you're going to do that, you might as well
>>> use .egg-info style eggs, since that will give better runtime
>>> performance.  That is, for an egg "foo-1.2-py2.5-platform.egg", you
>>> would unzip its contents to the target directory and then rename the
>>> extracted EGG-INFO directory to "foo-1.2-py2.5-platform.egg-info".
>>
>>
>> Thanks.
>>
>> Can you briefly explain or provide a link to something that explains
>> the performance improvement?
>
> Fewer directories on sys.path = better import performance, compared  
> to individually putting a series of .egg directories on sys.path.

Ah, I didn't understand what you meant by "unzip the contents to the  
target directory".

So the idea is that you'd merge the contents of multiple zip files  
into a single "target" directory.

This seems rather messy to me.  It doesn't appear to be compatible  
with multi-version installs.  Also, if a newer egg version removes a  
file, the removed file will be left installed after an upgrade.  If  
two eggs provide the same file, files will be overridden. Admittedly,  
this is a somewhat pathological case, but the overriding seems to  
compound the pathology.

> This doesn't apply so much for zipped eggs, because the zip  
> directories get cached.  Overall, the tradeoffs are:
>
> zipped .egg files = faster imports + some startup overhead to read  
> the zip directories
> .egg directories  = slower imports due to lots of stat-ing
> .egg-info         = "normal" imports + no extra startup overhead
>
> Zipped .egg files have the fastest import times overall, assuming  
> you don't have so many eggs to read that the overhead wipes out your  
> gains.  Extracting all eggs to the same directory is second fastest,  
> with .egg directories (i.e. --always-unzip) being the slowest thing  
> you could possibly do.

I wonder what the performance impacts really are.  I'm looking forward  
to measuring this some day for our projects.  We've put off making our  
eggs zip safe so far, as this hasn't been a high priority. :/

As described in a separate thread, I'm going to add an option to  
buildout so buildout users can explicitly define a directory to use as  
a setuptools cache.  In that case, zip-safe eggs can remain zipped  
even if they have extensions.

Jim

--
Jim Fulton
Zope Corporation




More information about the Distutils-SIG mailing list