[Distutils] Performance implications of having large numbers of eggs?
Phillip J. Eby
pje at telecommunity.com
Thu Jun 28 18:28:13 CEST 2007
At 10:47 AM 6/28/2007 -0500, Dave Peterson wrote:
>Has anyone done any investigation into the performance implications of
>having large numbers of eggs installed? Is there any sort of
>It seems to me that having a really large path might slow down imports a
>bit, though I suspect this is in C code so probably not a significant
If the eggs are zipped, the performance overhead for imports is
negligible, although there is a small startup cost to read the
zipfile indexes. Python caches zipfile indexes in memory, so
checking whether a module is present is just a dictionary lookup and
is much faster than having a directory on sys.path.
If the eggs are *not* zipped, however, the performance impact on
individual imports is much higher. That's why easy_install installs
eggs zipped by default.
> It also seems like there might be some startup penalties due
>to the overhead of setting up the path when using eggs, but this is a
>one-time cost during python startup, so probably not too bad either.
>I'm asking because we're in the process to switching our open-source
>Enthought Tool Suite library to a distribution of components via eggs
>and we're having some internal debate as to whether we need to minimize
>the number of eggs or not. It definitely seems nice to have smaller
>subsets of functionality -- from the point of being able to make things
>stable, managing their APIs, managing cross-component dependencies, and
>from the user update size viewpoint. But are we paying a performance
>penalty for going too small in scope with our eggs?
I suggest you measure what you're concerned about. At one point, I
did some timing tests that suggested that if you put the entire
Cheeseshop on sys.path as zipped eggs, you might increase Python's
startup time by a second or two. But your mileage may vary, and the
Cheeseshop has increased a lot in size since then. ;)
By the way, the long term plan for setuptools is that it should be
able to install things the "old fashioned" way and still be able to
manage them, using an installation manifest inside the .egg-info
directories. In that way, you'd have all the benefits of separate
distribution and a managed installation, as well as the benefits of
having only one directory on sys.path. But I haven't done any work on
implementing this yet.
(Actually, now that I think of it, somebody could probably create a
zc.buildout recipe to install eggs in an unpacked fashion (and let
buildout handle uninstallation). The tricky part would be namespace
package __init__ modules, since multiple eggs can share
responsibility for the __init__, and this might confuse zc.buildout.)
More information about the Distutils-SIG