[Distutils] Buildout - redo_pyc function too slow

Kamal Mustafa kamal at marimore.co.jp
Thu Dec 5 20:09:11 CET 2013


Installing large package such as Django on EC2 micro instance took a
very long time, 8-9 minutes with 99% cpu usage. Initially, I taught it
caused by setuptools analyzing the packages to figure out it zip_safe
or not [1]. But after looking at this closely, that's not the case.
Analyzing the egg only took few seconds and can be negligible to the
total time it took to install the whole package. I have also test by
adding zip_safe=False to django's setup.py and didn't see any drastic
improvement to the time taken to install it.

I test by using easy_install directly and it took around 1-2 minutes
to finish so it mean the other 8 minutes being spent in buildout
itself rather than in setuptools/easy_install. The install process
basically went like this:-

...
Writing /tmp/easy_install-GwjQPW/django-master/setup.cfg
Running django-master/setup.py -q bdist_egg --dist-dir
/tmp/easy_install-GwjQPW/django-mas
ter/egg-dist-tmp-Yk_MYR
warning: no previously-included files matching '__pycache__' found
under directory '*'
warning: no previously-included files matching '*.py[co]' found under
directory '*'
...
...
<LONG GAP HERE ...>
Got Django 1.7.
Picked: Django = 1.7
Generated script '/home/kamal/test_buildout/bin/django-admin.py'.
Generated interpreter '/home/kamal/test_buildout/bin/python'.

Stepping through the code, I figure out the LONG GAP starting after:-

                    dists = self._call_easy_install(
                        dist.location, ws, self._dest, dist)

in line 531 of zc/buildout/easy_install.py. Next after this line is:-

                    for dist in dists:
                        redo_pyc(dist.location)

Commenting this function call I manage to cut down the installation
time to 2m30s. So what the purpose of this function ? Skipping it seem
to be fine, I can import the package without any error. My
buildout.cfg:-

[buildout]
parts = base

[base]
recipe = zc.recipe.egg
eggs =
        Django
interpreter = python

The only reference to redo_pyc I found is
http://www.makina-corpus.org/blog/minitage-10-out which just saying
redo_pyc to be somehow slow.

[1]:https://github.com/buildout/buildout/issues/116


More information about the Distutils-SIG mailing list