[Distutils] buildout prepends eggs to sys.path, performance issues

Jan Van Hees jakkevanhees at gmail.com
Tue Apr 29 13:33:56 CEST 2014


Hi list,

 

A bit of context: we’re running buildout 2.0.1, building zope and plone
applications, zope 2.13.10, combined with plone.recipe.zope2instance. This
question only impacts the performance of the startup of an instance, the the
performance of a running instance. But starting an instance is something
developpers do quite often on a days development.

 

While debugging some instance startup performance issues, I came across the
following.

The buildout Scripts prepends all the eggs to the system path, before the
python path.

 

In our setup this causes quite some delays, because imports from standard
python modules, also try to find that module in every eggs directory, before
it can find in in the default python location (because the eggs are
prepended). The eggs even go before the local folder, so even for importing
a local module, all eggs are poked first. If you are only working on local
disks, the starup performance difference is neglectable (speaking seconds),
but if the eggs are located on network disks, there is a performance
difference of about 30% in startup time (speaking minutes), with the path
prepended vs appended (where appended is the faster startup).

 

Some numbers from strace: with eggs appended to sys.path:

% time     seconds  usecs/call     calls        errors    syscall

------ ----------- ----------- --------- --------- ----------------

48.96    0.014658           0             142555    134820      open

 

And with eggs prepended to sys.path

 

% time     seconds  usecs/call     calls       errors    syscall

------ ----------- ----------- --------- --------- ----------------

58.82    0.035995           0              199897    192150    open

 

As you can see the amount of calls, and consequently the amount of errors,
is noticeable higher when prepending the eggs to the sys.path. On local
disks, like the above numbers, the difference is noticeable, but still
fairly small (timewise, not perdcentage wise), add the bit of extra delay of
a network disk, and the differences become really noticable.

 

As far as I’ve always understood, the default procedure working with paths
should be to append, unless you have a good reason. The good reason in this
case, that I see, could be that you want to prepend certain packages and
that way make sure you use your version instead of what’s present in
site-packages. Now my expectation would be that that is a fairly limited set
of packages that need to be prepended. If there are many, options like
virtualenv exist to avoid taking site-packages at all (that’s what we do
btw).

 

Am I missing a use case for the sys.path prepending, or has this never been
an issue before? Because if there is only the site-packages, or a like
issue, I’m happy to have a look into splitting the python path up in what
should be prepended (whole python path except site-packages?), and making
append the default (site-packages + eggs + 
), or looking how I could
provide a buildout syntax, where append would be default and one could
explicitly prepend some packages with buildout.cfg.

 

Another option I’ve investigated is to use the meta_path hook, providing my
own find_module and load_module, keeping a dictionary of module locations. I
don’t have similar timings about the numbers of syscalls like the ones above
for this scenario. I’m reconstructing that setup for the moment and see if
that makes a difference. Timing wise it’s a bit slower (but still
acceptable, instance starts in 30sec instead of 15/20), but I have no
measurements yet, that indicate something usefull.

 

 

Thanks for your feedback,

 

Jan

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/distutils-sig/attachments/20140429/c23d10a4/attachment.html>


More information about the Distutils-SIG mailing list