buildout/setuptools slow as it scans the whole project dir

Hi, In some of my projects, buildout takes a looooooong time to complete. Sometimes the same problem occurs on the server or on another developer's laptop, sometimes not. The difference is something like "15 seconds if there's no problem" and "5 minutes if the problem occurs". It is terribly hard to pinpoint the exact problem and/or cause. An error in the setup.py or or MANIFEST.in is unlikely, as the very same project might take 5 minutes locally and 15 seconds on the server or vice versa... Anyway, if I run "bin/buildout" as "strace -f bin/buildout", I see a lot of "stat" calls. Setuptools walks through all the files inside my project dir. Including parts/omelette/* and, if available, a bower_compontents/ directory full of thousands of javascript files. Removing some of these directories (which running the buildout re-creates) fixes the speed issue for one run. I modified my local buildout copy to run "setup.py develop" with a -v instead of a -q option. This way I found out where it approximately happens: /usr/bin/python /tmp/tmp6UdsMl -v develop -mxN -d /vagrant/sso/develop-eggs/tmpfioc1Ibuild running develop running egg_info writing requirements to sso.egg-info/requires.txt writing sso.egg-info/PKG-INFO writing top-level names to sso.egg-info/top_level.txt writing dependency_links to sso.egg-info/dependency_links.txt writing entry points to sso.egg-info/entry_points.txt ### This is where the process seems to stop a couple of minutes to scan all the files. reading manifest file 'sso.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'sso.egg-info/SOURCES.txt' running build_ext Creating /vagrant/sso/develop-eggs/tmpfioc1Ibuild/sso.egg-link (link to .) The MANIFEST.in loooks like this: # Include docs in the root. include *.rst # Include everything in our project directory (sso/views.py, sso/static/some.js, etc) graft sso It is a git project. The setup.py looks like this: from setuptools import setup version = '1.1.dev0' long_description = '\n\n'.join([ open('README.rst').read(), open('CREDITS.rst').read(), open('CHANGES.rst').read(), ]) install_requires = [ 'Django >= 1.4.2, < 1.7', 'django-nose', 'lizard-auth-server', 'gunicorn', 'raven', 'werkzeug', 'south', 'django-auth-ldap', 'django-mama-cas', ], setup(name='sso', version=version, description="Single sign on server (and more) for lizard", long_description=long_description, # Get strings from http://www.python.org/pypi?%3Aaction=list_classifiers classifiers=['Programming Language :: Python', 'Framework :: Django', ], keywords=[], author='Do not blame Reinout', author_email='reinout.vanrees@nelen-schuurmans.nl', url='', license='GPL', packages=['sso'], zip_safe=False, install_requires=install_requires, entry_points={ 'console_scripts': [ ]}, ) Conclusion for me: something somewhere in setuptools is reading my whole project folder. It does it after "writing entry points to sso.egg-info/entry_points.txt" and before "reading manifest file 'sso.egg-info/SOURCES.txt'". The SOURCES.txt itself is small: CHANGES.rst CREDITS.rst LICENSE.rst MANIFEST.in README.rst setup.cfg setup.py sso/__init__.py sso/__init__.pyc sso/admin.py sso/developmentsettings.py sso/developmentsettings.pyc sso/models.py sso/models.pyc sso/settings.py sso/settings.pyc sso/stagingsettings.py sso/tests.py sso/urls.py sso/views.py sso.egg-info/PKG-INFO sso.egg-info/SOURCES.txt sso.egg-info/dependency_links.txt sso.egg-info/entry_points.txt sso.egg-info/not-zip-safe sso.egg-info/requires.txt Hm. I see some .pyc files in there. Something that needs fixing, but not the cause, I think. Is there something I'm missing? Where should I look? I cannot find "writing entry points to..." in the setuptools source code right away. Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"

On Wed, Apr 15, 2015 at 10:01 AM, Reinout van Rees <reinout@vanrees.org> wrote:
Hi,
In some of my projects, buildout takes a looooooong time to complete. Sometimes the same problem occurs on the server or on another developer's laptop, sometimes not. The difference is something like "15 seconds if there's no problem" and "5 minutes if the problem occurs". ...
I wonder if the culprit is _dir_hash in zc.buildout.buildlout. Buildout reinstalls a part if it has changed. It considers a part to have changed if it's arguments have changed or it's recipe has changed. If a recipe is a develop egg, then it computes a hash for the recipe based on it's package contents. A quick thing to try might be to hack _dir_hash to be a noop (e.g. add ``return 42`` at the top and see if it makes the delay go away.) Jim -- Jim Fulton http://www.linkedin.com/in/jimfulton

Jim Fulton schreef op 15-04-15 om 16:14:
I wonder if the culprit is _dir_hash in zc.buildout.buildlout.
Buildout reinstalls a part if it has changed. It considers a part to have changed if it's arguments have changed or it's recipe has changed. If a recipe is a develop egg, then it computes a hash for the recipe based on it's package contents.
A quick thing to try might be to hack _dir_hash to be a noop (e.g. add ``return 42`` at the top and see if it makes the delay go away.) No, that's not it. I tried the quick hack and it is still slow.
It should be somewhere in setuptools itself, as buildout executes a temp copy of the setup.py by calling /usr/bin/python /tmp/tmpEMi6Xe -v develop -mxN -d /vagrant/sso/develop-eggs/tmpfmeAuobuild The "/tmp/tmpEMi6Xe" is a copy of the setup.py, though I must check, and the directory at the end is presumably a symlink to the project directory. Setuptools seems to run all "egg_info.writers" entry points it can find in the piece of code where the slowness occurs. So any of the entry points could be the culprit. And perhaps even an entry point outside of setuptools, due to the way entry points work. I'll try to debug further. Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"

Reinout van Rees schreef op 15-04-15 om 16:26:
Setuptools seems to run all "egg_info.writers" entry points it can find in the piece of code where the slowness occurs. So any of the entry points could be the culprit. And perhaps even an entry point outside of setuptools, due to the way entry points work.
I'll try to debug further. Ok, the egg_info.writers entry points are innocent.
A couple of debug "print" statements later I found that setuptools/commands/egg_info.py's manifest_maker class calls "findall()", which is the monkeypatched version from setuptools/__init__.py. This does an "os.walk" through the entire project directory to create a full and complete list of files. Inside the os.walk() loop, it calls os.path.isfile() on every item. This takes a long time! 90%+ of the time is spend in parts/omelette/. And most of that in parts/omelette/django's localization folders... Of course it also goes through the var/ folder, which might store all sorts of files. Now... this is before it actually tries to include/exclude/graft stuff per the MANIFEST.in instructions. It just goes through the entire project. Why is it so slow? Probably because I'm running it on OSX inside a vmware VM (running ubuntu). Probably the disk access via the virtualization layer is slow. I'm not completely satisfied yet, as sometimes it also is slow on the server. (Also on VMs, but normally the performance hit isn't noticable). So.... it almost seems as if there's no solution to this? Or can someone give a hint on os.walk performance relative to VMs? Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"

Reinout van Rees schreef op 15-04-15 om 17:01:
So.... it almost seems as if there's no solution to this?
Or can someone give a hint on os.walk performance relative to VMs?
Bingo! I think I've found the real problem: symlinks between filesystems. The project is on the host file system and is mounted via vmware/vagrant inside the VM as /vagrant. So my project is on /vagrant/projectname. I have buildout set up with a download and egg cache, in ~/.buildout/downloads and ~/.buildout/eggs respectively. The omelette recipy makes symlinks in /vagrant/projectname/parts/omelette/ for all the eggs, pointing at ~/.buildout/eggs/eggname-version/ So: the symlink is FROM the mounted host filesystem at /vagrant to the VM's own ubuntu filesystem. And apparently symlinks from such a filesystem to another aren't effective. As a test, I've set the egg-cache directory to /vagrant/.buildout/eggs/ And yes: buildout performance is great again! I'll try to write it up a bit more eloquently and clearer later on. Dinner is calling now. (Let me indulge in a quote by my brother Maurits, not unknown on this mailinglist: "When dinner is calling, you need a rifle.") Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"

Could this be the same problem https://bitbucket.org/pypa/setuptools/issue/249/have-a-way-to-ingore-specifi... ? I frequently have this issue (.tox dir can sometimes get quite large) and I resorted to monkeypatching setuptools (I know, right?) in ~/.local/lib/python2.7/site-packages/usercustomize.py: import setuptools from distutils import filelist import os def findall(dir=os.curdir, original=filelist.findall, listdir=os.listdir, basename=os.path.basename): os.listdir = lambda path: [i for i in listdir(path) if '/.tox/' not in i and not i.startswith('.tox/')] try: return original(dir) finally: os.listdir = listdir filelist.findall = findall Thanks, -- Ionel Cristian Mărieș, http://blog.ionelmc.ro On Wed, Apr 15, 2015 at 5:01 PM, Reinout van Rees <reinout@vanrees.org> wrote:
Hi,
In some of my projects, buildout takes a looooooong time to complete. Sometimes the same problem occurs on the server or on another developer's laptop, sometimes not. The difference is something like "15 seconds if there's no problem" and "5 minutes if the problem occurs".
It is terribly hard to pinpoint the exact problem and/or cause. An error in the setup.py or or MANIFEST.in is unlikely, as the very same project might take 5 minutes locally and 15 seconds on the server or vice versa...
Anyway, if I run "bin/buildout" as "strace -f bin/buildout", I see a lot of "stat" calls. Setuptools walks through all the files inside my project dir. Including parts/omelette/* and, if available, a bower_compontents/ directory full of thousands of javascript files. Removing some of these directories (which running the buildout re-creates) fixes the speed issue for one run.
I modified my local buildout copy to run "setup.py develop" with a -v instead of a -q option. This way I found out where it approximately happens:
/usr/bin/python /tmp/tmp6UdsMl -v develop -mxN -d /vagrant/sso/develop-eggs/tmpfioc1Ibuild running develop running egg_info writing requirements to sso.egg-info/requires.txt writing sso.egg-info/PKG-INFO writing top-level names to sso.egg-info/top_level.txt writing dependency_links to sso.egg-info/dependency_links.txt writing entry points to sso.egg-info/entry_points.txt
### This is where the process seems to stop a couple of minutes to scan all the files.
reading manifest file 'sso.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'sso.egg-info/SOURCES.txt' running build_ext Creating /vagrant/sso/develop-eggs/tmpfioc1Ibuild/sso.egg-link (link to .)
The MANIFEST.in loooks like this:
# Include docs in the root. include *.rst # Include everything in our project directory (sso/views.py, sso/static/some.js, etc) graft sso
It is a git project. The setup.py looks like this:
from setuptools import setup
version = '1.1.dev0'
long_description = '\n\n'.join([ open('README.rst').read(), open('CREDITS.rst').read(), open('CHANGES.rst').read(), ])
install_requires = [ 'Django >= 1.4.2, < 1.7', 'django-nose', 'lizard-auth-server', 'gunicorn', 'raven', 'werkzeug', 'south', 'django-auth-ldap', 'django-mama-cas', ],
setup(name='sso', version=version, description="Single sign on server (and more) for lizard", long_description=long_description, # Get strings from http://www.python.org/pypi?% 3Aaction=list_classifiers classifiers=['Programming Language :: Python', 'Framework :: Django', ], keywords=[], author='Do not blame Reinout', author_email='reinout.vanrees@nelen-schuurmans.nl', url='', license='GPL', packages=['sso'], zip_safe=False, install_requires=install_requires, entry_points={ 'console_scripts': [ ]}, )
Conclusion for me: something somewhere in setuptools is reading my whole project folder. It does it after "writing entry points to sso.egg-info/entry_points.txt" and before "reading manifest file 'sso.egg-info/SOURCES.txt'". The SOURCES.txt itself is small:
CHANGES.rst CREDITS.rst LICENSE.rst MANIFEST.in README.rst setup.cfg setup.py sso/__init__.py sso/__init__.pyc sso/admin.py sso/developmentsettings.py sso/developmentsettings.pyc sso/models.py sso/models.pyc sso/settings.py sso/settings.pyc sso/stagingsettings.py sso/tests.py sso/urls.py sso/views.py sso.egg-info/PKG-INFO sso.egg-info/SOURCES.txt sso.egg-info/dependency_links.txt sso.egg-info/entry_points.txt sso.egg-info/not-zip-safe sso.egg-info/requires.txt
Hm. I see some .pyc files in there. Something that needs fixing, but not the cause, I think.
Is there something I'm missing? Where should I look? I cannot find "writing entry points to..." in the setuptools source code right away.
Reinout
-- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"
_______________________________________________ Distutils-SIG maillist - Distutils-SIG@python.org https://mail.python.org/mailman/listinfo/distutils-sig

Ionel Cristian Mărieș schreef op 15-04-15 om 19:04:
Could this be the same problem https://bitbucket.org/pypa/setuptools/issue/249/have-a-way-to-ingore-specifi... ? Yes, that's the same.
Though: normally reading such a dir structure shouldn't take long. See my other answer, in my case the problem was symlinks going to a different filesystem.
I frequently have this issue (.tox dir can sometimes get quite large) and I resorted to monkeypatching setuptools (I know, right?) in ~/.local/lib/python2.7/site-packages/usercustomize.py:
I like it :-) Reinout -- Reinout van Rees http://reinout.vanrees.org/ reinout@vanrees.org http://www.nelen-schuurmans.nl/ "Learning history by destroying artifacts is a time-honored atrocity"
participants (3)
-
Ionel Cristian Mărieș
-
Jim Fulton
-
Reinout van Rees