[Distutils] buildout: several fold performance increases
me at rpatterson.net
Sat Jan 21 11:19:03 CET 2012
I moved this patch to a branch of zc.buildout:
Digging into one of the additional failures, I traced the problem to
develop eggs. Since zc.buildout.easy_install.develop() can change what
eggs and versions a pkg_resources.Environment would find, the cached
environments were out of date after buildout processed develop eggs. I
addressed this by finding any cached environments whose paths include
the path the develop egg is installed into and having those environments
rescan that path. With that fix, my zc.buildout tests have no
additional failures beyond what they do on my system (with an isolated
python built from source).
I also compared buildout run times (using the 'time' command, not
cProfile) on a real world buildout with 6 identical parts and a few
other parts with very similar distribution requirements. The time
without the patches was 1m27.513s and with the patches as applied to
zc.buildout 1.4.4 it was 0m34.674s. Also, buildout.dumppickedversions
suffers the same logging hot spot as I fixed in zc.buildout r122980 and
before I patched it the buildout run time was 2m13s. Can someone cut a
release of buildout.dumppickedversions?
All told, this is a 4 fold real world improvement. With all those hot
spots addressed there are no obvious wastes I can find in the profiling
data. I'd like to merge this into trunk and the 1.4 branch and see
releases cut of both 1.5 and 1.4. May I begin the merging?
Ross Patterson <me at rpatterson.net> writes:
> I've long been perplexed by how long a buildout takes to run with
> multiple parts whose required distributions are largely similar. Taking
> a stab at it, I found two hot spots that yield several fold improvements
> in performance.
> First, zc.buildout.easy_install._log_requirements was doing expensive
> requirements parsing and sorting even when no message would be logged.
> I committed a fix for it that on a 10 part buildout with a large "eggs"
> option for each part decreased update time from a cProfile run time of
> 93 seconds to 15 seconds:
> Secondly, instantiating pkg_resources.Environment, including the
> setuptools.package_index.PackageIndex subclass, is very expensive and
> was being done multiple times for any given part, and was being done for
> parts whose environments were identical. There was some existing global
> caching for package indexes that I've duplicated for environments in the
> attached patch.
> Unfortunately, I haven't been able to get a clean test environment for
> the life of me. I'm using a clean Python 2.7 build from source, turning
> everything in ~/.buildout/default.cfg off, and running tests in a clean
> checkout of the zc.buildout/trunk buildout. Even under those conditions
> I get 17 failing tests before any changes. With this environments
> cache, I see 41 failures, but I can't make sense of it. This patch
> yields another 2-3 fold decrease to 6 seconds for the same buildout and
> is driven by profiling data, not guessing. Can someone help me get this
> patch in?
> Finally, it would be great to see releases of zc.buildout with these
> performance improvements get out in the world. I've been hearing more
> and more complaints about buildout run times and these are easy fixes.
> If we can get the second, attached patch in quickly, then I'd say we
> should release with both. If not, then it's still worth it to cut a
> release for the first, already committed patch, which yields the
> greatest improvement.
More information about the Distutils-SIG