
On 12:16 pm, solipsis@pitrou.net wrote:
For a), I think we can solve this only by redundancy, i.e. create more build slaves, hoping that a sufficient number would be up at any point in time.
We are already doing this, aren't we? http://www.python.org/dev/buildbot/3.x/
It doesn't seem to work very well, it's a bit like a Danaides vessel.
The source of the problem is that such a system can degrade without anybody taking action. If the web server's hard disk breaks down, people panic and look for a solution quickly. If the source control is down, somebody *will* "volunteer" to fix it. If the automated build system produces results less useful, people will worry, but not take action.
Well, to be fair, buildbots breaking also happens much more frequently (perhaps one or two orders of magnitude) than the SVN server or the Web site going down. Maintaining them looks like a Sisyphean task, and nobody wants that.
Perhaps this is a significant portion of the problem. Maintaining a build slave is remarkably simple and easy. I maintain about half a dozen slaves and spend at most a few minutes a month operating them. Actually setting one up in the first place might take a bit longer, since it involves installing the necessary software and making sure everything's set up right, but the actual slave configuration itself is one command: buildbot create-slave <path> <master address> <slave name> <slave password> Perhaps this will help dispel the idea that it is a serious undertaking to operate a slave. The real requirement which some people may find challenging is that the slave needs to operate on a host which is actually online almost all of the time. If you don't such a machine, then there's little point offering to host a slave.
I don't know what kind of machines are the current slaves, but if they are 24/7 servers, isn't it a bit surprising that the slaves would go down so often? Is the buildbot software fragile? Does it require a lot of (maintenance, repair) work from the slave owners?
As I have no specific experience maintaining any of the CPython build slaves, I can't speak to any maintenance issues which these slaves have encountered. I would expect that they are as minimal as the issues I have encountered maintaining slaves for other projects, but perhaps this is wrong. I do recall that there were some win32 issues (discussed on this list, I think) quite a while back, but I think those were resolved. I haven't heard of any other issues since then. If there are some, perhaps the people who know about them could raise them and we could try to figure out how to resolve them. Jean-Paul