[Twisted-Python] To those who are about to sprint

Last sprint the buildbot got overwhelmed with a giant backlog of runs and became essentially useless for the duration of the sprint. Jean-Paul had two suggestions: 1. Don't commit to trunk during the sprint, even if your branch was approved; wait until the sprint is over so trunk commits are spread out. 2. If you're just doing some initial cross-platform testing using force-builds.py, use the command line option that lets you run only some of the tests (e.g. "twisted.web" or "twisted.python.test.test_versions") instead of running the whole test suite. -- Itamar Turner-Trauring, Future Foundries LLC http://futurefoundries.com/ — Twisted consulting, training and support.

On Aug 18, 2012, at 5:59 AM, Itamar Turner-Trauring <itamar@futurefoundries.com> wrote:
Last sprint the buildbot got overwhelmed with a giant backlog of runs and became essentially useless for the duration of the sprint. Jean-Paul had two suggestions: Don't commit to trunk during the sprint, even if your branch was approved; wait until the sprint is over so trunk commits are spread out. If you're just doing some initial cross-platform testing using force-builds.py, use the command line option that lets you run only some of the tests (e.g. "twisted.web" or "twisted.python.test.test_versions") instead of running the whole test suite.
In an attempt to mitigate this problem, since pre-commit testing is a (the?) major bottleneck that we have at sprints, I will be bringing a nice fast machine with virtual machines for at least one flavor of each of our major supported platforms (OS X, Linux, Windows, FreeBSD). I still need to install some dependencies, but hopefully I can do some whole-suite pre-commit verification for other reviewers who are there. Hopefully the keyboard won't get too hot for me to type on :-). For the next sprint, our BuildBot maintainer has assured me that he'll be moving the build master infrastructure to a faster machine before then, so that hopefully it will cope better with the increased load of a sprint. -glyph

On 03:48 am, glyph@twistedmatrix.com wrote:
For the next sprint, our BuildBot maintainer has assured me that he'll be moving the build master infrastructure to a faster machine before then, so that hopefully it will cope better with the increased load of a sprint.
Slave capacity really needs to be increased as well to handle sprint load. Jean-Paul

On Aug 19, 2012, at 3:49 PM, exarkun@twistedmatrix.com wrote:
On 03:48 am, glyph@twistedmatrix.com wrote:
For the next sprint, our BuildBot maintainer has assured me that he'll be moving the build master infrastructure to a faster machine before then, so that hopefully it will cope better with the increased load of a sprint.
Slave capacity really needs to be increased as well to handle sprint load.
The major problem with buildbot overload at sprints is the fact that the website becomes so unresponsive that unrelated activity can't take place, e.g. in the bugtracker, for long periods of time. Correct me if I'm wrong, but if the buildmaster and website are separated, won't that address that problem? Also, won't the buildmaster evenly work through the backlog of submitted builds, finishing one every few minutes? Or does some parallel stuff happen that makes subsequent builds progressively worse? -glyph

On 12:05 am, glyph@twistedmatrix.com wrote:
On Aug 19, 2012, at 3:49 PM, exarkun@twistedmatrix.com wrote:
On 03:48 am, glyph@twistedmatrix.com wrote:
For the next sprint, our BuildBot maintainer has assured me that he'll be moving the build master infrastructure to a faster machine before then, so that hopefully it will cope better with the increased load of a sprint.
Slave capacity really needs to be increased as well to handle sprint load.
The major problem with buildbot overload at sprints is the fact that the website becomes so unresponsive that unrelated activity can't take place, e.g. in the bugtracker, for long periods of time.
Correct me if I'm wrong, but if the buildmaster and website are separated, won't that address that problem?
This is entirely possible, I'm not sure. It may also be the case that trac by itself generates sufficient load at sprints to become unusable. We'll find out soon, I guess.
Also, won't the buildmaster evenly work through the backlog of submitted builds, finishing one every few minutes? Or does some parallel stuff happen that makes subsequent builds progressively worse?
Generally not. Instead, the buildmaster will corrupt its local state, wedge itself into non-responsiveness, or enter an infinite loop. Additionally, it will direct one or more slaves to disconnect, crash, or at least corrupt (slave local) kernel state such that a sysadmin needs to restart the machine. Jean-Paul
participants (3)
-
exarkun@twistedmatrix.com
-
Glyph
-
Itamar Turner-Trauring