[Distutils] buildout: several fold performance increases

Ross Patterson me at rpatterson.net
Sun Jan 22 02:37:07 CET 2012


Ross Patterson <me at rpatterson.net> writes:

> Ross Patterson <me at rpatterson.net> writes:
>
>> Marius Gedminas <marius at pov.lt> writes:
>>
>>> On Sat, Jan 21, 2012 at 02:19:03AM -0800, Ross Patterson wrote:
>>>> I moved this patch to a branch of zc.buildout:
>>>> 
>>>> svn+ssh://svn.zope.org/repos/main/zc.buildout/branches/env-cache

[...snip...]

> With this procedure, here are my timings with two different buildouts,
> one that is development focused with lots of different parts with
> slightly different to wildly different required dists, and one that is a
> production buildout with many identical parts.  I did all this after
> completely clearing out my egg and download caches and running each
> buildout once first to re-download all eggs so that results aren't
> skewed by a large, out-of-date egg cache:

I re-timed everything comparing zc.buildout 1.4.4, 1.5.2 and the
env-cache branch.  All of these are with the trunk version of
buildout.dumppickedversions which includes the optimizations from there.
These are all against the production buildout from before but with a
large egg cache.

+------------------+-----------+-----------+-----------+
|                  | 1.4.4     | 1.5.2     | env-cache |
+------------------+-----------+-----------+-----------+
| bin/buildout -N  | 1m28.394s | 0m34.441s | 0m20.468s |
+------------------+-----------+-----------+-----------+
| bin/buildout     | 1m51.173s | 0m57.524s | 0m42.124s |
+------------------+-----------+-----------+-----------+
| bin/buildout -vN | 2m25.618s | 1m18.324s | 1m4.896s  |
+------------------+-----------+-----------+-----------+
| bin/buildout -v  | 2m36.758s | 1m32.049s | 1m27.575s |
+------------------+-----------+-----------+-----------+

So 1.5 has a lot of improvements over 1.4 and these changes offer some
incremental improvement over that but the improvement is a bigger
relative difference when running with "-N" which I think is the more
important case.  Whatever 1.5 does differently also obviates the
_log_requirements optimization, it only applies to 1.4 so I've reverted
it on trunk.

Now that I understand what's going on with zc.buildout.easy_install and
pkg_resources, it seems like the right way to handle this is not to scan
all the paths involved in a working set for all possible dists including
those not required for the working set.  Instead, it might be better to
just scan the paths for dists we know we need.  Maybe we can subclass
pkg_resources.Environment that only stores the path in scan() and then
scans the paths for specific project_names in __getitem__().  With the
current caching, the profiling data indicates very little time spent
scanning, so there's only incremental benefit to doing this in addition
to the caching.  It me be a more correct solution to do this *instead*
of the caching.  Also, with this approach, it may be possible to do
global dist caching which can be updated as things are changed by
hooking into zc.buildout.easy_install's develop(), install(), and
build().  Thoughts?

Ross



More information about the Distutils-SIG mailing list