[Distutils] Performance implications of having large numbers of eggs?

Phillip J. Eby pje at telecommunity.com
Thu Jun 28 18:28:13 CEST 2007


At 10:47 AM 6/28/2007 -0500, Dave Peterson wrote:
>Has anyone done any investigation into the performance implications of
>having large numbers of eggs installed?  Is there any sort of
>performance hit?
>
>
>It seems to me that having a really large path might slow down imports a
>bit, though I suspect this is in C code so probably not a significant
>problem.

If the eggs are zipped, the performance overhead for imports is 
negligible, although there is a small startup cost to read the 
zipfile indexes.  Python caches zipfile indexes in memory, so 
checking whether a module is present is just a dictionary lookup and 
is much faster than having a directory on sys.path.

If the eggs are *not* zipped, however, the performance impact on 
individual imports is much higher.  That's why easy_install installs 
eggs zipped by default.


>   It also seems like there might be some startup penalties due
>to the overhead of setting up the path when using eggs, but this is a
>one-time cost during python startup, so probably not too bad either.
>
>I'm asking because we're in the process to switching our open-source
>Enthought Tool Suite library to a distribution of components via eggs
>and we're having some internal debate as to whether we need to minimize
>the number of eggs or not.   It definitely seems nice to have smaller
>subsets of functionality -- from the point of being able to make things
>stable,  managing their APIs, managing cross-component dependencies, and
>from the user update size viewpoint.  But are we paying a performance
>penalty for going too small in scope with our eggs?

I suggest you measure what you're concerned about.  At one point, I 
did some timing tests that suggested that if you put the entire 
Cheeseshop on sys.path as zipped eggs, you might increase Python's 
startup time by a second or two.  But your mileage may vary, and the 
Cheeseshop has increased a lot in size since then.  ;)

By the way, the long term plan for setuptools is that it should be 
able to install things the "old fashioned" way and still be able to 
manage them, using an installation manifest inside the .egg-info 
directories.  In that way, you'd have all the benefits of separate 
distribution and a managed installation, as well as the benefits of 
having only one directory on sys.path. But I haven't done any work on 
implementing this yet.

(Actually, now that I think of it, somebody could probably create a 
zc.buildout recipe to install eggs in an unpacked fashion (and let 
buildout handle uninstallation).  The tricky part would be namespace 
package __init__ modules, since multiple eggs can share 
responsibility for the __init__, and this might confuse zc.buildout.)



More information about the Distutils-SIG mailing list