[Distutils] Experience of setuptools' cache design
jim at zope.com
Fri Jan 18 19:31:30 CET 2008
On Jan 18, 2008, at 1:04 PM, Phillip J. Eby wrote:
> At 12:50 PM 1/18/2008 -0500, Jim Fulton wrote:
>> On Jan 18, 2008, at 12:32 PM, Phillip J. Eby wrote:
>>> At 10:13 AM 1/18/2008 -0500, Jim Fulton wrote:
>>>> I would want some way to prevent it, especially in a
>>>> production environment.
>>> The only way to absolutely prevent it is to install things
>>> unzipped. And if you're going to do that, you might as well
>>> use .egg-info style eggs, since that will give better runtime
>>> performance. That is, for an egg "foo-1.2-py2.5-platform.egg", you
>>> would unzip its contents to the target directory and then rename the
>>> extracted EGG-INFO directory to "foo-1.2-py2.5-platform.egg-info".
>> Can you briefly explain or provide a link to something that explains
>> the performance improvement?
> Fewer directories on sys.path = better import performance, compared
> to individually putting a series of .egg directories on sys.path.
Ah, I didn't understand what you meant by "unzip the contents to the
So the idea is that you'd merge the contents of multiple zip files
into a single "target" directory.
This seems rather messy to me. It doesn't appear to be compatible
with multi-version installs. Also, if a newer egg version removes a
file, the removed file will be left installed after an upgrade. If
two eggs provide the same file, files will be overridden. Admittedly,
this is a somewhat pathological case, but the overriding seems to
compound the pathology.
> This doesn't apply so much for zipped eggs, because the zip
> directories get cached. Overall, the tradeoffs are:
> zipped .egg files = faster imports + some startup overhead to read
> the zip directories
> .egg directories = slower imports due to lots of stat-ing
> .egg-info = "normal" imports + no extra startup overhead
> Zipped .egg files have the fastest import times overall, assuming
> you don't have so many eggs to read that the overhead wipes out your
> gains. Extracting all eggs to the same directory is second fastest,
> with .egg directories (i.e. --always-unzip) being the slowest thing
> you could possibly do.
I wonder what the performance impacts really are. I'm looking forward
to measuring this some day for our projects. We've put off making our
eggs zip safe so far, as this hasn't been a high priority. :/
As described in a separate thread, I'm going to add an option to
buildout so buildout users can explicitly define a directory to use as
a setuptools cache. In that case, zip-safe eggs can remain zipped
even if they have extensions.
More information about the Distutils-SIG