Possible language summit topic: buildbots

Would it be worth spending some time discussing the buildbot situation at the PyCon 2010 language summit? In the past, I've found the buildbots to be an incredibly valuable resource; especially when working with aspects of Python or C that tend to vary significantly from platform to platform (for me, this usually means floating-point, and platform math libraries, but there are surely many other things it applies to). But more recently there seem to have been some difficulties keeping a reasonable number of buildbots up and running. A secondary problem is that it can be awkward to debug some of the more obscure test failures on buildbots without having direct access to the machine. From conversations on IRC, I don't think I'm alone in wanting to find ways to make the buildbots more useful. So the question is: how best to invest time and possibly money to improve the buildbot situation (and as a result, I hope, improve the quality of Python)? What could be done to make maintenance of build slaves easier? Or to encourage interested third parties to donate hardware and time? Are there good alternatives to Buildbot that might make a difference? What do other projects do? These are probably the wrong questions; I'm hoping that a discussion would help produce the right questions, and possibly some answers. Mark

Mark Dickinson wrote:
Would it be worth spending some time discussing the buildbot situation at the PyCon 2010 language summit? In the past, I've found the buildbots to be an incredibly valuable resource; especially when working with aspects of Python or C that tend to vary significantly from platform to platform (for me, this usually means floating-point, and platform math libraries, but there are surely many other things it applies to). But more recently there seem to have been some difficulties keeping a reasonable number of buildbots up and running. A secondary problem is that it can be awkward to debug some of the more obscure test failures on buildbots without having direct access to the machine. From conversations on IRC, I don't think I'm alone in wanting to find ways to make the buildbots more useful.
These are actually two issues: a) where do we get buildbot hardware and operators? b) how can we reasonably debug problems occurring on buildbots For a), I think we can solve this only by redundancy, i.e. create more build slaves, hoping that a sufficient number would be up at any point in time. So: what specific kinds of buildbots do you think are currently lacking? A call for volunteers will likely be answered quickly.
So the question is: how best to invest time and possibly money to improve the buildbot situation (and as a result, I hope, improve the quality of Python)?
I don't think money will really help (I'm skeptical in general that money helps in open source projects). As for time: "buildbot scales", meaning that the buildbot slave admins will all share the load, being responsible only for their own slaves. On the master side: would you be interested in tracking slave admins?
What could be done to make maintenance of build slaves easier?
This is something that only the slave admins can answer. I don't think it's difficult - it's just that people are really unlikely to contribute to the same thing over a period of five years at a steady rate. So we need to make sure to find replacements when people drop out.
Or to encourage interested third parties to donate hardware and time?
Again: I think a call for volunteers would do (Steve, if you are reading this, please hold back just a few days before actually making such a call :-)
Are there good alternatives to Buildbot that might make a difference?
I think people have started working on such a thing. There are certainly alternatives; I'm fairly skeptical that they are *good* alternatives (but then, I'm the one who set up the buildbot installation in the first place).
What do other projects do?
I think that's really difficult to compare, since their testing often has a very different scope. I think CruiseControl is widely used.
These are probably the wrong questions; I'm hoping that a discussion would help produce the right questions, and possibly some answers.
I think these are good questions - just not for the summit. Setting up such a system is, conceptually, easy. It's also just a little work to set it up initially; the difficult part then is to keep it running (and no, a system where anybody can just post test results at any time without prior registration is *still* difficult to keep running). The source of the problem is that such a system can degrade without anybody taking action. If the web server's hard disk breaks down, people panic and look for a solution quickly. If the source control is down, somebody *will* "volunteer" to fix it. If the automated build system produces results less useful, people will worry, but not take action. Regards, Martin

For a), I think we can solve this only by redundancy, i.e. create more build slaves, hoping that a sufficient number would be up at any point in time.
We are already doing this, aren't we? http://www.python.org/dev/buildbot/3.x/ It doesn't seem to work very well, it's a bit like a Danaides vessel.
The source of the problem is that such a system can degrade without anybody taking action. If the web server's hard disk breaks down, people panic and look for a solution quickly. If the source control is down, somebody *will* "volunteer" to fix it. If the automated build system produces results less useful, people will worry, but not take action.
Well, to be fair, buildbots breaking also happens much more frequently (perhaps one or two orders of magnitude) than the SVN server or the Web site going down. Maintaining them looks like a Sisyphean task, and nobody wants that. I don't know what kind of machines are the current slaves, but if they are 24/7 servers, isn't it a bit surprising that the slaves would go down so often? Is the buildbot software fragile? Does it require a lot of (maintenance, repair) work from the slave owners?

On 12:16 pm, solipsis@pitrou.net wrote:
For a), I think we can solve this only by redundancy, i.e. create more build slaves, hoping that a sufficient number would be up at any point in time.
We are already doing this, aren't we? http://www.python.org/dev/buildbot/3.x/
It doesn't seem to work very well, it's a bit like a Danaides vessel.
The source of the problem is that such a system can degrade without anybody taking action. If the web server's hard disk breaks down, people panic and look for a solution quickly. If the source control is down, somebody *will* "volunteer" to fix it. If the automated build system produces results less useful, people will worry, but not take action.
Well, to be fair, buildbots breaking also happens much more frequently (perhaps one or two orders of magnitude) than the SVN server or the Web site going down. Maintaining them looks like a Sisyphean task, and nobody wants that.
Perhaps this is a significant portion of the problem. Maintaining a build slave is remarkably simple and easy. I maintain about half a dozen slaves and spend at most a few minutes a month operating them. Actually setting one up in the first place might take a bit longer, since it involves installing the necessary software and making sure everything's set up right, but the actual slave configuration itself is one command: buildbot create-slave <path> <master address> <slave name> <slave password> Perhaps this will help dispel the idea that it is a serious undertaking to operate a slave. The real requirement which some people may find challenging is that the slave needs to operate on a host which is actually online almost all of the time. If you don't such a machine, then there's little point offering to host a slave.
I don't know what kind of machines are the current slaves, but if they are 24/7 servers, isn't it a bit surprising that the slaves would go down so often? Is the buildbot software fragile? Does it require a lot of (maintenance, repair) work from the slave owners?
As I have no specific experience maintaining any of the CPython build slaves, I can't speak to any maintenance issues which these slaves have encountered. I would expect that they are as minimal as the issues I have encountered maintaining slaves for other projects, but perhaps this is wrong. I do recall that there were some win32 issues (discussed on this list, I think) quite a while back, but I think those were resolved. I haven't heard of any other issues since then. If there are some, perhaps the people who know about them could raise them and we could try to figure out how to resolve them. Jean-Paul

On Oct 25, 2009, at 9:50 AM, exarkun@twistedmatrix.com wrote:
Actually setting one up in the first place might take a bit longer, since it involves installing the necessary software and making sure everything's set up right, but the actual slave configuration itself is one command:
buildbot create-slave <path> <master address> <slave name> <slave password>
I have written a Fabric script for the distutils-buildbot project (on bitbucket, under Tarek) that puts everything necessary up onto an Ubuntu server, runs all the build steps, and fires up the buildbot. Obviously it will have to be modified to correctly configure other types of servers but the implementation should be fairly trivial for someone who could have done it by hand in the first place. Once it's done, it's in the script and may require an occasional tweak but not much more. The next step is to have the slaves themselves created in the cloud, fired up and then report to the mother ship that they're available. This last step is the one that doesn't seem to be supported by the current system. Thanks, S

2009/10/25 <exarkun@twistedmatrix.com>:
Perhaps this is a significant portion of the problem. Maintaining a build slave is remarkably simple and easy. I maintain about half a dozen slaves and spend at most a few minutes a month operating them. Actually setting one up in the first place might take a bit longer, since it involves installing the necessary software and making sure everything's set up right, but the actual slave configuration itself is one command:
buildbot create-slave <path> <master address> <slave name> <slave password>
Perhaps this will help dispel the idea that it is a serious undertaking to operate a slave.
The real requirement which some people may find challenging is that the slave needs to operate on a host which is actually online almost all of the time. If you don't such a machine, then there's little point offering to host a slave.
I have been seriously considering setting up one or more buildslaves for a while now. However, my biggest issue is that they would be running as VMs on my normal PC, which means that it's the issue of keeping them continually online that hurts me. If I could (say) just fire the slaves up for a set period, or fire them up, have them do a build and report back, and then shut down, that would make my life easier (regular activities rather than ongoing sysadmin works better for me). It sounds like a buildslave isn't really what I should be looking at. Maybe Titus' push model pony-build project would make more sense for me. Paul.

On 05:47 pm, p.f.moore@gmail.com wrote:
2009/10/25 <exarkun@twistedmatrix.com>:
Perhaps this is a significant portion of the problem. �Maintaining a build slave is remarkably simple and easy. �I maintain about half a dozen slaves and spend at most a few minutes a month operating them. Actually setting one up in the first place might take a bit longer, since it involves installing the necessary software and making sure everything's set up right, but the actual slave configuration itself is one command:
�buildbot create-slave <path> <master address> <slave name> <slave password>
Perhaps this will help dispel the idea that it is a serious undertaking to operate a slave.
The real requirement which some people may find challenging is that the slave needs to operate on a host which is actually online almost all of the time. �If you don't such a machine, then there's little point offering to host a slave.
I have been seriously considering setting up one or more buildslaves for a while now. However, my biggest issue is that they would be running as VMs on my normal PC, which means that it's the issue of keeping them continually online that hurts me.
If I could (say) just fire the slaves up for a set period, or fire them up, have them do a build and report back, and then shut down, that would make my life easier (regular activities rather than ongoing sysadmin works better for me).
It sounds like a buildslave isn't really what I should be looking at. Maybe Titus' push model pony-build project would make more sense for me.
Maybe. I wonder if Titus' "push model" (I don't really understand this term in this context) makes sense for continuous integration at all, though. As a developer, I don't want to have access to build results across multiple platforms when someone else feels like it. I want access when *I* feel like it. Anyway, BuildBot is actually perfectly capable of dealing with this. I failed to separate my assumptions about how everyone would want to use the system from what the system is actually capable of. If you run a build slave and it's offline when a build is requested, the build will be queued and run when the slave comes back online. So if the CPython developers want to work this way (I wouldn't), then we don't need pony-build; BuildBot will do just fine. Jean-Paul

2009/10/25 <exarkun@twistedmatrix.com>:
If you run a build slave and it's offline when a build is requested, the build will be queued and run when the slave comes back online. So if the CPython developers want to work this way (I wouldn't), then we don't need pony-build; BuildBot will do just fine.
OK, sounds useful. If I'm offline for a while, do multiple builds get queued, or only the "last" one? If the former, I can imagine coming back to a pretty huge load if the slave breaks while I'm on holiday :-( I should look all of this up somewhere. Is there a reference to buildbot for slave maintainers? Are there any specifics for Python slaves that I should refer to? (From what I've been able to find, it seems to me that setting up a slave first requires getting things sorted with the admins, which sadly precludes experimenting to find things out - I can understand why the python admins don't want people "playing" on the live buildbot infrastructure, though :-)) Paul.

OK, sounds useful. If I'm offline for a while, do multiple builds get queued, or only the "last" one?
IIRC, it will only build the last one, then with a huge blame list.
If the former, I can imagine coming back to a pretty huge load if the slave breaks while I'm on holiday :-(
If it's offline too often, I'm skeptical that it would be useful. If you report breakage after a day, then it will be difficult to attribute this to a specific commit. It is most useful to have continuous integration if error reports are instantaneous.
I should look all of this up somewhere. Is there a reference to buildbot for slave maintainers? Are there any specifics for Python slaves that I should refer to?
Hmm. I thought I had send you this before: http://wiki.python.org/moin/BuildbotOnWindows
(From what I've been able to find, it seems to me that setting up a slave first requires getting things sorted with the admins, which sadly precludes experimenting to find things out - I can understand why the python admins don't want people "playing" on the live buildbot infrastructure, though :-))
That's really not an issue. Feel free to play as much as you want, with the live infrastructure. You can't break anything doing so (perhaps except for spamming the mailing list with faulty reports). If you then decide to withdraw your offer, that's fine as well (just make sure to notify us instead of just silently taking the slave down). Regards, Martin

Martin v. Löwis <martin <at> v.loewis.de> writes:
If it's offline too often, I'm skeptical that it would be useful. If you report breakage after a day, then it will be difficult to attribute this to a specific commit. It is most useful to have continuous integration if error reports are instantaneous.
Not only, but it's useful to have a stable set of buildbots, so that when a test fails, you know whether it's a new problem caused by a recent revision, or rather an already existing problem on this platform. Regards Antoine.

As I have no specific experience maintaining any of the CPython build slaves, I can't speak to any maintenance issues which these slaves have encountered. I would expect that they are as minimal as the issues I have encountered maintaining slaves for other projects, but perhaps this is wrong. I do recall that there were some win32 issues (discussed on this list, I think) quite a while back, but I think those were resolved. I haven't heard of any other issues since then.
Only partially. One old issue was that previous builds would not complete, keeping the executable files open, preventing further runs. Buildbot is supposed to kill a build, but only kills the parent process (as it is really difficult to kill the entire process tree (*)). We work around this by explicitly killing any stale Python processes at the beginning of a new build. The remaining issue is the popups; if a process still has a popup, you can't even terminate it properly. There are two kinds of popups: system-generated ones, and CRT-generated ones. For the CRT ones, we once had a way to turn them off, but I'm not sure whether that mechanism might have been removed. For the system messages, there is a way to turn them off in the parent process. David Bolen (IIRC) had developed a patch, but I think this patch only runs on his system(s). Regards, Martin (*) it may help if Buildbot would create a Win32 job object, and then use TerminateJobObject. Contributions are welcome.

On Oct 25, 2009, at 3:06 PM, Martin v. Löwis wrote:
(*) it may help if Buildbot would create a Win32 job object, and then use TerminateJobObject. Contributions are welcome.
Some work has already been done on this, but it needs help. At the root it's a Twisted issue: http://twistedmatrix.com/trac/ticket/2726

"Martin v. Löwis" <martin@v.loewis.de> writes:
The remaining issue is the popups; if a process still has a popup, you can't even terminate it properly. There are two kinds of popups: system-generated ones, and CRT-generated ones. For the CRT ones, we once had a way to turn them off, but I'm not sure whether that mechanism might have been removed. For the system messages, there is a way to turn them off in the parent process. David Bolen (IIRC) had developed a patch, but I think this patch only runs on his system(s).
Yes, process-stopping dialogs have probably been the single most annoying issue over time running a Windows build slave - certainly from my perspective in terms of maintenance and detection they have taken up the largest amount of time. I believe the CRT disabling is still active in the 3.x branches (the "-n" flag to regrtest in the buildbot's test.bat), after you restored it this past March (it had regressed during an earlier py3k branch set of patches and caused a bunch of problems for a bit) but not in the 2.x branches or trunk. So there's still a bit of exposure there and I'd certainly be in favor of porting the regrtest -n support over to all current development branches. I think the other issue most likely to cause a perceived "downtime" with the Windows build slave that I've had a handful of cases over the past two years where the build slave appears to be operating properly, but the master seems to just queue up jobs as if it were down. The slave still shows an established TCP link to the master, so I generally only catch this when I happen to peek at the status web page, or catch a remark here on python-dev, so that can reduce availability. My build slave (based on 0.7.5 I think) runs with local patches to: 1. Protect again Win32 pop-up error boxes in child processes. 2. A fix for a recursive chaining of Twisted Deferreds during uploads which could break for large file transfers. This only came up when my build slave was generating daily MSI builds and uploading them to the master. 3. Handle clock jumps. It's a general flaw in the presence of system clock adjustments, but I only encountered it while in my FreeBSD build slave under VMWare with a Linux host. (2) and (3) are useful, but not likely to be an issue with most build slaves in normal operation. (2) probably isn't needed on my end any more now that the daily MSI builds aren't run, and it's possible that it got corrected in later buildbot updates, since I did report it on the development list at the time. (1) is a pretty trivial patch, but added a dependency on pywin32, so passing it back up to the buildbot maintainers (back in 2007) stalled while determining if that was ok, and I don't I ever closed the loop. I did later make it fall back to ctypes if pywin32 was missing, but then I think buildbot was using a minimum of Python 2.3 at the time, so even ctypes was a new dependency. Anyway, it became less crucial when Python's regrtest started executing similar code, though the buildbot patch covers anything run under it and not just the python process. I'd of course be happy to pass along the patch to anyone interested. I believe that Thomas Heller had run his Windows buildbot with some similar local code, but implemented with a modified buildbot script for building Python, rather than tweaking buildbot itself. Of course, the patch only protects against system pop-ups - it can't control the CRT assertion dialogs when Python is built in debug mode, which is why I've argued in the past that the test process for Python should ensure those are disabled. The CRT errors themselves are still important, but can be redirected to stderr rather than a blocking GUI dialog. -- David

David Bolen wrote: [snip]
I think the other issue most likely to cause a perceived "downtime" with the Windows build slave that I've had a handful of cases over the past two years where the build slave appears to be operating properly, but the master seems to just queue up jobs as if it were down. The slave still shows an established TCP link to the master, so I generally only catch this when I happen to peek at the status web page, or catch a remark here on python-dev, so that can reduce availability.
[snip] Couldn't you write a script to check the status periodically?

MRAB <python@mrabarnett.plus.com> writes:
Couldn't you write a script to check the status periodically?
Sure, I suppose scraping the web status page would work. If it happened frequently I'd probably be forced to do something like that, but it's relatively low frequency (though I guess it does have a big impact in terms of availability) makes it hard to dedicate time to that compared to my "real world" work :-) -- David

David Bolen wrote:
MRAB <python@mrabarnett.plus.com> writes:
Couldn't you write a script to check the status periodically?
Sure, I suppose scraping the web status page would work. If it happened frequently I'd probably be forced to do something like that, but it's relatively low frequency (though I guess it does have a big impact in terms of availability) makes it hard to dedicate time to that compared to my "real world" work :-)
In addition, if it was happening frequently, we would rather investigate the problem and fix it, than working around. Regards, Martin

On 25 Oct, 09:36 pm, db3l.net@gmail.com wrote:
I think the other issue most likely to cause a perceived "downtime" with the Windows build slave that I've had a handful of cases over the past two years where the build slave appears to be operating properly, but the master seems to just queue up jobs as if it were down. The slave still shows an established TCP link to the master, so I generally only catch this when I happen to peek at the status web page, or catch a remark here on python-dev, so that can reduce availability.
This sounds like something that should be reported upstream. Particularly if you know how to reproduce it. Has it been? Jean-Paul

exarkun@twistedmatrix.com writes:
This sounds like something that should be reported upstream. Particularly if you know how to reproduce it. Has it been?
No, largely because I can't reproduce it at all. It's happened maybe 4-5 times in the past 2 years or so. All that I see is that my end looks good yet the master end seems not to be dispatching jobs (it never shows an explicit disconnect for my slave though). My best guess is that something disrupted the TCP connection, and that the slave isn't doing anything that would let it know its connection was dropped. Although I thought there were periodic pings even from the slave side. Given the frequency, it's not quite high priority to me, though having the master let the owner of a slave know when it's down would help cut down on lost availability due to this case, so I suppose I could suggest that feature to the buildbot developers. -- David

On 01:28 am, db3l.net@gmail.com wrote:
exarkun@twistedmatrix.com writes:
This sounds like something that should be reported upstream. Particularly if you know how to reproduce it. Has it been?
No, largely because I can't reproduce it at all. It's happened maybe 4-5 times in the past 2 years or so. All that I see is that my end looks good yet the master end seems not to be dispatching jobs (it never shows an explicit disconnect for my slave though).
My best guess is that something disrupted the TCP connection, and that the slave isn't doing anything that would let it know its connection was dropped. Although I thought there were periodic pings even from the slave side.
Given the frequency, it's not quite high priority to me, though having the master let the owner of a slave know when it's down would help cut down on lost availability due to this case, so I suppose I could suggest that feature to the buildbot developers.
This feature exists, at least. BuildBot can email people when slaves are offline for more than some configured time limit. I'm not sure if the CPython master is configured to do this or not. It's easy to set up if not, the BuildSlave initializer accepts a list of email addresses that will be notified when that slave goes offline, notify_on_missing: http://buildbot.net/apidocs/buildbot.buildslave.AbstractBuildSlave- class.html#__init__ Jean-Paul

This sounds like something that should be reported upstream. Particularly if you know how to reproduce it. Has it been?
No, largely because I can't reproduce it at all. It's happened maybe 4-5 times in the past 2 years or so. All that I see is that my end looks good yet the master end seems not to be dispatching jobs (it never shows an explicit disconnect for my slave though).
It's not really reproducible. I think it sometimes happens when I restart the master; sometimes, some clients fail to reconnect (properly).
It's easy to set up if not, the BuildSlave initializer accepts a list of email addresses that will be notified when that slave goes offline, notify_on_missing:
http://buildbot.net/apidocs/buildbot.buildslave.AbstractBuildSlave- class.html#__init__
I tried that out a couple of weeks ago, but never received any email. I didn't have the time to look into this further since. Regards, Martin

Martin v. Löwis <martin <at> v.loewis.de> writes:
It's not really reproducible. I think it sometimes happens when I restart the master; sometimes, some clients fail to reconnect (properly).
Another common problem is that some buildbot fails in the middle of the test suite, with the following kind of message: command timed out: 1800 seconds without output, killing pid 12325 process killed by signal 9 program finished with exit code -1 elapsedTime=10910.362981 See for example : http://www.python.org/dev/buildbot/trunk.stable/builders/ia64%20Ubuntu%20tru... (notice, by the way, the elapsed time (10910s, that is, close to 3 hours...)) Regards Antoine.

Antoine Pitrou wrote:
Martin v. Löwis <martin <at> v.loewis.de> writes:
It's not really reproducible. I think it sometimes happens when I restart the master; sometimes, some clients fail to reconnect (properly).
Another common problem is that some buildbot fails in the middle of the test suite, with the following kind of message:
command timed out: 1800 seconds without output, killing pid 12325 process killed by signal 9 program finished with exit code -1 elapsedTime=10910.362981
See for example : http://www.python.org/dev/buildbot/trunk.stable/builders/ia64%20Ubuntu%20tru...
(notice, by the way, the elapsed time (10910s, that is, close to 3 hours...))
That's not really a challenge to the buildbot operator, though - the buildbot will continue just fine afterwards, right? For some reason, the test suite stopped producing output, and eventually, the buildbot decided to kill the build process. Most likely, the machine ran severely out of memory, so everything stopped working. Regards, Martin

Antoine Pitrou wrote:
For a), I think we can solve this only by redundancy, i.e. create more build slaves, hoping that a sufficient number would be up at any point in time.
We are already doing this, aren't we? http://www.python.org/dev/buildbot/3.x/
It doesn't seem to work very well, it's a bit like a Danaides vessel.
Both true. However, it seems that Mark is unhappy with the current set of systems, so we probably need to do it again.
Well, to be fair, buildbots breaking also happens much more frequently (perhaps one or two orders of magnitude) than the SVN server or the Web site going down. Maintaining them looks like a Sisyphean task, and nobody wants that.
It only looks so. It is like any server management task - it takes constant effort. However, it is not Sisyphean (feeling Greek today, ain't you :-); since you actually achieve something. It's not hard to restart a buildbot when it has crashed, and it gives a warm feeling of having achieved something.
I don't know what kind of machines are the current slaves, but if they are 24/7 servers, isn't it a bit surprising that the slaves would go down so often? Is the buildbot software fragile?
Not really. It sometimes happens that the slaves don't reconnect after a master restart, but more often, it is just a change on the slave side that breaks it (such as a reboot done to the machine, and not having the machine configured to restart the slave after the reboot).
Does it require a lot of (maintenance, repair) work from the slave owners?
On Unix, not really. On Windows, there is still the issue that sometimes, some error message pops up which you need to click away. Over several builds, you may find that you have to click away dozens of such messages. This could use some improvement. Regards, Martin

On 09:47 am, martin@v.loewis.de wrote:
Mark Dickinson wrote:
Would it be worth spending some time discussing the buildbot situation at the PyCon 2010 language summit? In the past, I've found the buildbots to be an incredibly valuable resource; especially when working with aspects of Python or C that tend to vary significantly from platform to platform (for me, this usually means floating-point, and platform math libraries, but there are surely many other things it applies to). But more recently there seem to have been some difficulties keeping a reasonable number of buildbots up and running. A secondary problem is that it can be awkward to debug some of the more obscure test failures on buildbots without having direct access to the machine. From conversations on IRC, I don't think I'm alone in wanting to find ways to make the buildbots more useful.
These are actually two issues: a) where do we get buildbot hardware and operators? b) how can we reasonably debug problems occurring on buildbots
For a), I think we can solve this only by redundancy, i.e. create more build slaves, hoping that a sufficient number would be up at any point in time.
So: what specific kinds of buildbots do you think are currently lacking? A call for volunteers will likely be answered quickly.
So the question is: how best to invest time and possibly money to improve the buildbot situation (and as a result, I hope, improve the quality of Python)?
I don't think money will really help (I'm skeptical in general that money helps in open source projects). As for time: "buildbot scales", meaning that the buildbot slave admins will all share the load, being responsible only for their own slaves.
I think that money can help in two ways in this case. First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. Amazon's EC2 service is supported out of the box in the latest release. Second, there are a number of active BuildBot developers. One of them has even recently taken a contract from Mozilla to implement some non- trivial BuildBot enhancements. I think it very likely that he would consider taking such a contract from the PSF for whatever enhancements would help out the CPython buildbot.
On the master side: would you be interested in tracking slave admins?
What could be done to make maintenance of build slaves easier?
This is something that only the slave admins can answer. I don't think it's difficult - it's just that people are really unlikely to contribute to the same thing over a period of five years at a steady rate. So we need to make sure to find replacements when people drop out.
This is a good argument for VMs. It's certainly *possible* to chase an ever changing set of platforms, but it strikes me as something of a waste of time.
The source of the problem is that such a system can degrade without anybody taking action. If the web server's hard disk breaks down, people panic and look for a solution quickly. If the source control is down, somebody *will* "volunteer" to fix it. If the automated build system produces results less useful, people will worry, but not take action.
To me, that raises the question of why people aren't more concerned with the status of the build system. Shouldn't developers care if the code they're writing works or not? Jean-Paul

<exarkun <at> twistedmatrix.com> writes:
To me, that raises the question of why people aren't more concerned with the status of the build system. Shouldn't developers care if the code they're writing works or not?
The fact that we ask questions and publicly express worries should hint that we /are/ concerned :-) However, being mostly developers rather than system admins, and not knowing anything about the details of how buildbot does its work (not to mention the details of this or that particular buildslave and slave owner), makes us (at least me) quite clueless when faced with a buildbot-not-working-as-expected problem. Regards Antoine.

On Oct 25, 2009, at 10:05 AM, exarkun@twistedmatrix.com wrote:
First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. Amazon's EC2 service is supported out of the box in the latest release.
I have been working to expand this support to Rackspace's Cloud Servers as well. S

I think that money can help in two ways in this case.
First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. Amazon's EC2 service is supported out of the box in the latest release.
Here I'm skeptical. I think we can find people donating always-online machines still; no need to throw donated money to Amazon.
Second, there are a number of active BuildBot developers. One of them has even recently taken a contract from Mozilla to implement some non- trivial BuildBot enhancements. I think it very likely that he would consider taking such a contract from the PSF for whatever enhancements would help out the CPython buildbot.
That could indeed be interesting, assuming we had a clear requirement. But then, most of us can "easily" fix things in buildbot themselves - this is python-dev, after all.
This is something that only the slave admins can answer. I don't think it's difficult - it's just that people are really unlikely to contribute to the same thing over a period of five years at a steady rate. So we need to make sure to find replacements when people drop out.
This is a good argument for VMs.
Not really - you still would need somebody to manage them.
It's certainly *possible* to chase an ever changing set of platforms, but it strikes me as something of a waste of time.
Hmm - can you really get "strange" operating systems "in the cloud"? Some of the operating systems that we would like to test don't even support VMs.
To me, that raises the question of why people aren't more concerned with the status of the build system. Shouldn't developers care if the code they're writing works or not?
I think there are two issues here: 1. some developers actually *don't* care to much whether their code works in all cases. If it fails on some strange platform they never heard of (such as "Solaris", or "Windows"), they are not bothered by the failure. Or, if they care, they still don't know what to do about the failure. 2. the buildbots sometimes report false positives. Some tests fail in a non-repeatable fashion, only on selected systems. So when you change something, the tests break, and you cannot really see how this could be possibly related to your change. Then you start ignoring these reports - both the bogus ones, and the real ones. Regards, Martin

From my experience (five large buildbots with many developers plus two with only a couple of developers), a buildbot does little good unless
Right, how do developers benefit from a buildbot? the tests are reliable and not too noisy. "Reliable" is best achieved by having tests be deterministic and reproducible. "Not too noisy" means that the builders are all green all the time (at least for a "supported" subset of the buildslaves). Beyond that, then I think there has to be a culture change where the development team decides that it is really, really not okay to leave a builder red after you turned it red, and that instead you need to revert the patch that made it go from green to red before you do anything else. It has taken me a long time to acculturate to that and I wouldn't expect most people to do it quickly or easily. (It is interesting to think of what would happen if that policy were automated -- any patch which caused any "supported" builder to go from green to red would be automatically be reverted.) Also, of course, this is mostly meaningless unless the code that is being changed by the patches is well-covered by tests. Regards, Zooko

Hello, Sorry for the little redundancy, I would like to underline Jean-Paul's suggestion here: Le Sun, 25 Oct 2009 14:05:12 +0000, exarkun a écrit :
I think that money can help in two ways in this case.
First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. Amazon's EC2 service is supported out of the box in the latest release.
I'm not a PSF member, but it seems to me that the PSF could ask Amazon (or any other virtual machine business anyway) to donate a small number of permanent EC2 instances in order to run buildslaves on. After all, big companies often like sponsoring open-source projects, especially when the project is well-known and the donation is cheap for them. This would have several advantages: - the machines are administered by the provider: we don't have to worry about failed hardware, connectivity loss etc. - any Python core developer could get ssh access to the VMs to run tests directly, since they would be dedicated buildbot instances - they are not tied to a particular owner when it comes to fixing system problems, which means we eliminate a single point of failure: if a volunteer gets demotivated/bored/missing in action, someone can replace him/her easily - there are a number of various OS images available (of course, we still need competent people to install the required software -- buildbot, etc.) Since I've never used any such service ("cloud"-based VMs), I'm not sure what the downsides would be. But it seems to be that it would be at least worth trying. Right now we have around 15 buildbots but two thirds of them are down, the others sometimes fail or disconnect in erratic ways and it's difficult for "regular" core developers to be aware of what's precisely going on. Of course this could also be a broken idea, for whatever reason I'm not aware of. Regards Antoine.

On Fri, Oct 30, 2009 at 04:21:06PM +0000, Antoine Pitrou wrote:
Hello,
Sorry for the little redundancy, I would like to underline Jean-Paul's suggestion here:
Le Sun, 25 Oct 2009 14:05:12 +0000, exarkun a ??crit??:
I think that money can help in two ways in this case.
First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. Amazon's EC2 service is supported out of the box in the latest release.
I'm not a PSF member, but it seems to me that the PSF could ask Amazon (or any other virtual machine business anyway) to donate a small number of permanent EC2 instances in order to run buildslaves on.
[ ... ] I'm happy to provide VMs or shell access for Windows (XP, Vista, 7); Linux ia64; Linux x86; and Mac OS X. Others have made similar offers. The architectures supported by the cloud services don't really add anything (and generally don't have Mac OS X support, AFAIK). What we really need (IMO) is someone to dig into the tests to figure out which tests fail randomly and why, and to fix them on specific architectures that most of us don't personally use. This is hard work that is neither glamorous nor popular. I think the idea of paying a dedicated developer to make the CPython+buildbot tests reliable is better, although I would still be -0 on it (I don't think the PSF should be paying for this kind of thing at all). cheers, --titus -- C. Titus Brown, ctb@msu.edu

Le vendredi 30 octobre 2009 à 09:31 -0700, C. Titus Brown a écrit :
[ ... ]
I'm happy to provide VMs or shell access for Windows (XP, Vista, 7); Linux ia64; Linux x86; and Mac OS X. Others have made similar offers. The architectures supported by the cloud services don't really add anything (and generally don't have Mac OS X support, AFAIK).
Well these VMs would have to run buildslaves on them, then. Are you ready to host some and connect them to the current buildbot infrastructure? (VMs without buildslaves are less interesting IMO)
What we really need (IMO) is someone to dig into the tests to figure out which tests fail randomly and why, and to fix them on specific architectures that most of us don't personally use. This is hard work that is neither glamorous nor popular.
I'm sure some of us are ready to do so (*). The situation has already improved quite a lot in the recent times. But fixing platform- or, worse, setup-specific issues often requires shell access to the target system, otherwise you spend too much time trying fixes on the SVN and waiting for the buildbot to react. (*) After all, if we weren't, we wouldn't even care about buildbots, we'd be content with running the test suite on our own machines
I think the idea of paying a dedicated developer to make the CPython+buildbot tests reliable is better, although I would still be -0 on it (I don't think the PSF should be paying for this kind of thing at all).
Paying developers in volunteer communities is always more contentious than paying for other kinds of resources. (It's generally more expensive too)

On Fri, Oct 30, 2009 at 05:41:39PM +0100, Antoine Pitrou wrote:
Le vendredi 30 octobre 2009 ?? 09:31 -0700, C. Titus Brown a ??crit :
[ ... ]
I'm happy to provide VMs or shell access for Windows (XP, Vista, 7); Linux ia64; Linux x86; and Mac OS X. Others have made similar offers. The architectures supported by the cloud services don't really add anything (and generally don't have Mac OS X support, AFAIK).
Well these VMs would have to run buildslaves on them, then. Are you ready to host some and connect them to the current buildbot infrastructure? (VMs without buildslaves are less interesting IMO)
No, I'm not willing to spend the time to install and maintain buildbot. But I'm happy to give the necessary access to those who are interested and willing. (...and let me tell you, getting these !#%!#$! Windows VMs up and running already took an immense amount of effort ;)
What we really need (IMO) is someone to dig into the tests to figure out which tests fail randomly and why, and to fix them on specific architectures that most of us don't personally use. This is hard work that is neither glamorous nor popular.
I'm sure some of us are ready to do so (*). The situation has already improved quite a lot in the recent times. But fixing platform- or, worse, setup-specific issues often requires shell access to the target system, otherwise you spend too much time trying fixes on the SVN and waiting for the buildbot to react.
(*) After all, if we weren't, we wouldn't even care about buildbots, we'd be content with running the test suite on our own machines
I look forward to it! cheers, --titus -- C. Titus Brown, ctb@msu.edu

2009/10/30 C. Titus Brown <ctb@msu.edu>:
On Fri, Oct 30, 2009 at 04:21:06PM +0000, Antoine Pitrou wrote:
Hello,
Sorry for the little redundancy, I would like to underline Jean-Paul's suggestion here:
Le Sun, 25 Oct 2009 14:05:12 +0000, exarkun a ??crit??:
I think that money can help in two ways in this case.
First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. Amazon's EC2 service is supported out of the box in the latest release.
I'm not a PSF member, but it seems to me that the PSF could ask Amazon (or any other virtual machine business anyway) to donate a small number of permanent EC2 instances in order to run buildslaves on.
[ ... ]
I'm happy to provide VMs or shell access for Windows (XP, Vista, 7); Linux ia64; Linux x86; and Mac OS X. Others have made similar offers. The architectures supported by the cloud services don't really add anything (and generally don't have Mac OS X support, AFAIK).
As Antione pointed out, it's not clear (at least, it isn't to me) what that leaves to be done. As a counter-offer: Given remote access to however many Windows VMs you want to provide, I'll get them up and running with buildslaves on them. If that requires software such as Visual Studio, I have copies via the MSDN licenses that I am happy to provide. Once things are up and running, I'll be prepared to do basic care and feeding of the buildslave, but as my time is limited, it would be nice if others would pitch in to help. In other words, if it's setup effort that's lacking, I'll provide it. As long as someone else can cover systems admin, and we get some level of volunteers to cover ongoing support, that should give us better Windows coverage on the buildbots. Paul.

On Fri, Oct 30, 2009 at 04:49:51PM +0000, Paul Moore wrote:
2009/10/30 C. Titus Brown <ctb@msu.edu>:
On Fri, Oct 30, 2009 at 04:21:06PM +0000, Antoine Pitrou wrote:
Hello,
Sorry for the little redundancy, I would like to underline Jean-Paul's suggestion here:
Le Sun, 25 Oct 2009 14:05:12 +0000, exarkun a ??crit??:
I think that money can help in two ways in this case.
First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. ?BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. ?Amazon's EC2 service is supported out of the box in the latest release.
I'm not a PSF member, but it seems to me that the PSF could ask Amazon (or any other virtual machine business anyway) to donate a small number of permanent EC2 instances in order to run buildslaves on.
[ ... ]
I'm happy to provide VMs or shell access for Windows (XP, Vista, 7); Linux ia64; Linux x86; and Mac OS X. ?Others have made similar offers. ?The architectures supported by the cloud services don't really add anything (and generally don't have Mac OS X support, AFAIK).
As Antione pointed out, it's not clear (at least, it isn't to me) what that leaves to be done.
Great! We've solved the problem ;)
As a counter-offer: Given remote access to however many Windows VMs you want to provide, I'll get them up and running with buildslaves on them. If that requires software such as Visual Studio, I have copies via the MSDN licenses that I am happy to provide.
I, too, have MSDN licenses, and I have functioning build environments on all of the VMs (I think -- I've only tested Win XP currently: http://lyorn.idyll.org/ctb/pb-dev/python/detail?result_key=8276 ) I also have an OS X 10.5 machine that I can let you into through a firewall; it's building Python 2.7 quite nicely: http://lyorn.idyll.org/ctb/pb-dev/python/detail?result_key=8229
Once things are up and running, I'll be prepared to do basic care and feeding of the buildslave, but as my time is limited, it would be nice if others would pitch in to help.
I would be somewhat unhappy about giving more than three or four people admin access, but am prepared to lie back and think of England. --titus -- C. Titus Brown, ctb@msu.edu

2009/10/30 C. Titus Brown <ctb@msu.edu>:
As a counter-offer: Given remote access to however many Windows VMs you want to provide, I'll get them up and running with buildslaves on them. If that requires software such as Visual Studio, I have copies via the MSDN licenses that I am happy to provide.
I, too, have MSDN licenses, and I have functioning build environments on all of the VMs (I think -- I've only tested Win XP currently:
http://lyorn.idyll.org/ctb/pb-dev/python/detail?result_key=8276
OK, so I guess it's just setting the buildbot stuff up.
I also have an OS X 10.5 machine that I can let you into through a firewall; it's building Python 2.7 quite nicely:
http://lyorn.idyll.org/ctb/pb-dev/python/detail?result_key=8229
Sorry, I've no experience with OS X at all.
Once things are up and running, I'll be prepared to do basic care and feeding of the buildslave, but as my time is limited, it would be nice if others would pitch in to help.
I would be somewhat unhappy about giving more than three or four people admin access, but am prepared to lie back and think of England.
Greetings from England... :-) I doubt it'll be a huge issue, I just didn't want to end up doing nothing more than delivering 5 more red boxes on the buildbot status page. We can see how it goes. I just think that maintaining the buildbots as more of a community effort means that there's a better chance of issues being fixed quickly. Paul.

On Fri, 30 Oct 2009 at 19:46, Paul Moore wrote:
2009/10/30 C. Titus Brown <ctb@msu.edu>:
Once things are up and running, I'll be prepared to do basic care and feeding of the buildslave, but as my time is limited, it would be nice if others would pitch in to help.
I would be somewhat unhappy about giving more than three or four people admin access, but am prepared to lie back and think of England.
Greetings from England... :-)
I doubt it'll be a huge issue, I just didn't want to end up doing nothing more than delivering 5 more red boxes on the buildbot status page. We can see how it goes. I just think that maintaining the buildbots as more of a community effort means that there's a better chance of issues being fixed quickly.
I'd be happy to help with keeping these (or any other) buildbots running, but I'm not much of a Windows geek (I can work with it, but I know a _lot_ more about Linux). Same goes for OS/X, though since that is unix based I'm a little more comfortable with it. I guess what I'm saying is, if you don't get responses from more Windows-savvy developers, then let me know and I'll be glad to help. That said, the idea of EC2 buildslaves seems pretty attractive... --David

On 04:31 pm, ctb@msu.edu wrote:
On Fri, Oct 30, 2009 at 04:21:06PM +0000, Antoine Pitrou wrote:
Hello,
Sorry for the little redundancy, I would like to underline Jean-Paul's suggestion here:
Le Sun, 25 Oct 2009 14:05:12 +0000, exarkun a ??crit??:
I think that money can help in two ways in this case.
First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. Amazon's EC2 service is supported out of the box in the latest release.
I'm not a PSF member, but it seems to me that the PSF could ask Amazon (or any other virtual machine business anyway) to donate a small number of permanent EC2 instances in order to run buildslaves on.
[ ... ]
I'm happy to provide VMs or shell access for Windows (XP, Vista, 7); Linux ia64; Linux x86; and Mac OS X.
Okay, let's move on this. Martin has, I believe, said that potential slave operators only need to contact him to get credentials for new slaves. Can you make sure to follow up with him to get slaves running on these machines? Or would you rather give out access to someone else and have them do the build slave setup?
Others have made similar offers.
I'll similarly encourage them to take action, then. Do you happen to remember who?
The architectures supported by the cloud services don't really add anything (and generally don't have Mac OS X support, AFAIK).
That's not entirely accurate. Currently, CPython has slaves on these platforms: - x86 - FreeBSD - Windows XP - Gentoo Linux - OS X - ia64 - Ubuntu Linux - Alpha - Debian Linux So, assuming we don't want to introduce any new OS, Amazon could fill in the following holes: - x86 - Ubuntu Linux - ia64 - FreeBSD - Windows XP - Gentoo Linux So very modestly, that's 4 currently missing slaves which Amazon's cloud service *does* add. It's easy to imagine further additions it could make as well.
What we really need (IMO) is someone to dig into the tests to figure out which tests fail randomly and why, and to fix them on specific architectures that most of us don't personally use. This is hard work that is neither glamorous nor popular.
Sure. That's certainly necessary. I don't think anyone is suggesting that it's not. Fortunately, adding more build slaves is not mutually exclusive with a developer fixing bugs in CPython.
I think the idea of paying a dedicated developer to make the CPython+buildbot tests reliable is better, although I would still be -0 on it (I don't think the PSF should be paying for this kind of thing at all).
I hope everyone is on board with the idea of fixing bugs in CPython, either in the actual implementation of features or in the tests for those features. That being the case, the discussion of whether or not the PSF should try to fund such a task is perhaps best discussed on the PSF members list. Jean-Paul

On Fri, Oct 30, 2009 at 04:58:29PM -0000, exarkun@twistedmatrix.com wrote:
On 04:31 pm, ctb@msu.edu wrote:
On Fri, Oct 30, 2009 at 04:21:06PM +0000, Antoine Pitrou wrote:
Hello,
Sorry for the little redundancy, I would like to underline Jean-Paul's suggestion here:
Le Sun, 25 Oct 2009 14:05:12 +0000, exarkun a ??crit??:
I think that money can help in two ways in this case.
First, there are now a multitude of cloud hosting providers which will operate a slave machine for you. BuildBot has even begun to support this deployment use-case by allowing you to start up and shut down vms on demand to save on costs. Amazon's EC2 service is supported out of the box in the latest release.
I'm not a PSF member, but it seems to me that the PSF could ask Amazon (or any other virtual machine business anyway) to donate a small number of permanent EC2 instances in order to run buildslaves on.
[ ... ]
I'm happy to provide VMs or shell access for Windows (XP, Vista, 7); Linux ia64; Linux x86; and Mac OS X.
Okay, let's move on this. Martin has, I believe, said that potential slave operators only need to contact him to get credentials for new slaves. Can you make sure to follow up with him to get slaves running on these machines? Or would you rather give out access to someone else and have them do the build slave setup?
I think we crossed threads here; I can provide the VMs, and access to them, but I won't (empirically, don't have the regular time available to ;) maintain buildbot buildslaves. You or Antoine or others are welcome to contact me off-list. Just give me an account name and ssh key, and I'll give you login access via tunneled Remote Desktop to the Windows machines.
Others have made similar offers.
I'll similarly encourage them to take action, then. Do you happen to remember who?
Every few months this thread seems to pop up and then fizzles when people realize the level of work and attention involved (<- he says cynically) in exploiting the offer of resources; I hope that anyone interested in offering resources will pop their head up again to look around.
I hope everyone is on board with the idea of fixing bugs in CPython, either in the actual implementation of features or in the tests for those features. That being the case, the discussion of whether or not the PSF should try to fund such a task is perhaps best discussed on the PSF members list.
Sure. --titus -- C. Titus Brown, ctb@msu.edu

Since I've never used any such service ("cloud"-based VMs), I'm not sure what the downsides would be. But it seems to be that it would be at least worth trying.
Not sure whether it's still relevant after the offers of individually donated hardware. However, if you want to look into this, feel free to set up EC2 slaves. When it comes to the point of actually having to pay money, please send a request to psf-board@python.org (make sure you don't pay anything until the request is approved). Exact processing would have to be decided, then, traditionally, it would be most simple if you could invoice the PSF (IIUC). Giving the PSF as the billing address would probably also work (check with the treasurer). Regards, Martin

Martin v. Löwis <martin <at> v.loewis.de> writes:
Not sure whether it's still relevant after the offers of individually donated hardware.
We'll see, indeed.
However, if you want to look into this, feel free to set up EC2 slaves.
I only know to setup mainstream Linux distros though (Debian- or Redhat-lookalikes :-)). I've just played a bit and, after the hassle of juggling with a bunch of different keys and credentials, setting up an instance and saving an image for future use is quite easy. Regards Antoine.

On 31 Oct, 08:13 pm, solipsis@pitrou.net wrote:
Martin v. L�wis <martin <at> v.loewis.de> writes:
Not sure whether it's still relevant after the offers of individually donated hardware.
We'll see, indeed.
However, if you want to look into this, feel free to set up EC2 slaves.
I only know to setup mainstream Linux distros though (Debian- or Redhat-lookalikes :-)). I've just played a bit and, after the hassle of juggling with a bunch of different keys and credentials, setting up an instance and saving an image for future use is quite easy.
Starting with a mainstream distro doesn't seem like a bad idea. For example, there isn't currently a 32bit Ubuntu (any version) slave. That would be a nice gap to fill in, right? Jean-Paul

<exarkun <at> twistedmatrix.com> writes:
Starting with a mainstream distro doesn't seem like a bad idea. For example, there isn't currently a 32bit Ubuntu (any version) slave. That would be a nice gap to fill in, right?
I've setup a buildslave on an EC2 Ubuntu Karmic instance here: http://www.python.org/dev/buildbot/all/buildslaves/pitrou-ubuntu However, since it's right now billed on my credit card, I'll shut it down in a couple of days. I wonder how we can get the PSF to be billed instead of me, if the PSF accepts to fund such an instance (which, given EC2 prices, is perhaps not the best use of money?). Regards Antoine.

I've setup a buildslave on an EC2 Ubuntu Karmic instance here: http://www.python.org/dev/buildbot/all/buildslaves/pitrou-ubuntu
However, since it's right now billed on my credit card, I'll shut it down in a couple of days. I wonder how we can get the PSF to be billed instead of me, if the PSF accepts to fund such an instance (which, given EC2 prices, is perhaps not the best use of money?).
Send a request to psf-board@python.org. Such request should include: - contact information (who'll be running this project) - project purpose/description - estimated expenses (in case of doubt, round up rather than rounding down) - a proposal of how payment should proceed. I'm not quite sure whether it could be billed on the PSF credit card (you may ask Kurt Kaiser in advance); traditionally, it worked best when we received invoices. There will be a board meeting next Monday, so it might be useful to have a proposal by then. As for whether that's good use of the money, I'm skeptical as well. I don't actually know what EC2 prices *are*, or what the current pricing for a root server is (plus there had been various offers from people donating hardware - from people who would be unable to also donate time). There was discussion that an EC2 instance can be turned on only when needed, so we could try to set up something like that (the build master could then trigger activation of the machine, IIUC). However, it might be that the machine would have to be up most of the day, as there would be one build going on always, anyway. Regards, Martin

Le lundi 02 novembre 2009 à 13:31 +0100, "Martin v. Löwis" a écrit :
There was discussion that an EC2 instance can be turned on only when needed, so we could try to set up something like that (the build master could then trigger activation of the machine, IIUC). However, it might be that the machine would have to be up most of the day, as there would be one build going on always, anyway.
Yes, I think that would be the case. We have frequent commits on each of the 4 branches under test, with a test suite that takes quite a bit of time to run in debug mode with -uall. Moreover, a standard ("small", which also means cheapest) EC2 instance apparently provides (based on a quick test) between 25% and 50% of the power of a full CPU core, which makes builds longer. I thought a full CPU core was provided, but it is not. An always-on "small" EC2 instance is at least 500$ a year, with a small storage cost added to that. Therefore, I think EC2 buildslaves would be interesting under the condition that they are donated rather than paid for. I don't know whether anyone has contacts at Amazon. (but then any donated piece of hardware would be good, not necessary an EC2 instance) I assume Jean-Paul made his original suggestion under the premise that the EC2 instances would only be run when necessary, which is probably very economical with Twisted's development model (few commits on trunk) but not with ours. Regards Antoine.

An always-on "small" EC2 instance is at least 500$ a year, with a small storage cost added to that.
OTOH, that isn't that expensive (compared to the other PSF expenses), plus people keep donating money, so when we say what we use it for, there may be a larger return than just the test results. OTTH, the same could be achieved by buying a hosted server elsewhere. Regards, Martin P.S. Perhaps this *is* the time for Steve to ask for "always-on" machines.

OTOH, that isn't that expensive (compared to the other PSF expenses), plus people keep donating money, so when we say what we use it for, there may be a larger return than just the test results.
OTTH, the same could be achieved by buying a hosted server elsewhere.
One advantage of a real hosted server is that we could have a bunch of our own VMs on it, which is probably not possible (and perhaps not allowed) on an EC2 instance (not to mention it could really be slow). (I'm not volunteering to install and manage VMs, however; I don't think I'm competent to do that)

On Mon, Nov 2, 2009 at 8:06 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
OTOH, that isn't that expensive (compared to the other PSF expenses), plus people keep donating money, so when we say what we use it for, there may be a larger return than just the test results.
OTTH, the same could be achieved by buying a hosted server elsewhere.
One advantage of a real hosted server is that we could have a bunch of our own VMs on it, which is probably not possible (and perhaps not allowed) on an EC2 instance (not to mention it could really be slow).
(I'm not volunteering to install and manage VMs, however; I don't think I'm competent to do that)
Now that the bulk of my pycon work is "completed" - I'll thread swap to my VM host proposal and the moratorium PEP. So don't worry about the VM host. jesse

On Nov 2, 2009, at 6:30 AM, Antoine Pitrou wrote:
<exarkun <at> twistedmatrix.com> writes:
Starting with a mainstream distro doesn't seem like a bad idea. For example, there isn't currently a 32bit Ubuntu (any version) slave. That would be a nice gap to fill in, right?
I've setup a buildslave on an EC2 Ubuntu Karmic instance here: http://www.python.org/dev/buildbot/all/buildslaves/pitrou-ubuntu
If you could send me the script that you used to set it up, I could give it a shot on RackSpace where it's cheaper (and I have a temporary developer account). S

Le lundi 02 novembre 2009 à 07:42 -0500, ssteinerX@gmail.com a écrit :
If you could send me the script that you used to set it up, I could give it a shot on RackSpace where it's cheaper (and I have a temporary developer account).
There's no need for a special script, really. Install Python, buildbot, subversion (all packaged by your Linux provider), then a couple of development libraries so that enough extension modules get built: dev headers for zlib, bz2, openssl and sqlite3 would be enough IMO. When you have done that, go to http://wiki.python.org/moin/BuildBot and follow the instructions at the end. Skip "install buildbot from source" if you've installed it from your distro's package repo. Also, you may have to replace "buildbot slave" with "buildbot create-slave".

On Oct 25, 2009, at 5:47 AM, Martin v. Löwis wrote:
These are actually two issues: a) where do we get buildbot hardware and operators?
I've been trying to get some feedback about firing up buildbots on Cloud Servers for a while now and haven't had much luck. I'd love to find a way of having buildbots come to life, report to the mother ship, do the build, then go away 'till next time they're required. S

I've been trying to get some feedback about firing up buildbots on Cloud Servers for a while now and haven't had much luck. I'd love to find a way of having buildbots come to life, report to the mother ship, do the build, then go away 'till next time they're required.
I'm not quite sure whom you have been trying to get feedback from, and can't quite picture your proposed setup from above description. In any case, ISTM that your approach isn't compatible with how buildbot works today (not sure whether you are aware of that): a build slave needs to stay connected all the time, so that the build master can trigger a build when necessary. So if your setup requires the slaves to shut down after a build, I don't think this can possibly work. Regards, Martin

On Sun, Oct 25, 2009 at 07:32:52PM +0100, "Martin v. L?wis" wrote:
I've been trying to get some feedback about firing up buildbots on Cloud Servers for a while now and haven't had much luck. I'd love to find a way of having buildbots come to life, report to the mother ship, do the build, then go away 'till next time they're required.
I'm not quite sure whom you have been trying to get feedback from, and can't quite picture your proposed setup from above description.
In any case, ISTM that your approach isn't compatible with how buildbot works today (not sure whether you are aware of that): a build slave needs to stay connected all the time, so that the build master can trigger a build when necessary.
Hi Martin, it shouldn't be difficult to cobble together a build script that spins up a buildslave on EC2 and runs the tests there; I wrote something similar a few years ago for an infrequently connected home machine. --titus -- C. Titus Brown, ctb@msu.edu

it shouldn't be difficult to cobble together a build script that spins up a buildslave on EC2 and runs the tests there; I wrote something similar a few years ago for an infrequently connected home machine.
Ok - so it would be the master running this script? Sounds reasonable to me. As for EC2 (and other cloud providers); I'm somewhat skeptical about platform coverage, also. How many different processors and operating systems can they possibly support? Regards, Martin

On 06:32 pm, martin@v.loewis.de wrote:
I've been trying to get some feedback about firing up buildbots on Cloud Servers for a while now and haven't had much luck. I'd love to find a way of having buildbots come to life, report to the mother ship, do the build, then go away 'till next time they're required.
I'm not quite sure whom you have been trying to get feedback from, and can't quite picture your proposed setup from above description.
In any case, ISTM that your approach isn't compatible with how buildbot works today (not sure whether you are aware of that): a build slave needs to stay connected all the time, so that the build master can trigger a build when necessary.
So if your setup requires the slaves to shut down after a build, I don't think this can possibly work.
This is supported in recent versions of BuildBot with a special kind of slave: http://djmitche.github.com/buildbot/docs/0.7.11/#On_002dDemand- _0028_0022Latent_0022_0029-Buildslaves Jean-Paul

This is supported in recent versions of BuildBot with a special kind of slave:
http://djmitche.github.com/buildbot/docs/0.7.11/#On_002dDemand- _0028_0022Latent_0022_0029-Buildslaves
Interesting. Coming back to "PSF may spend money", let me say this: If somebody would volunteer to set up slaves in EC2, and operate them, I'm fairly certain that the PSF would pay the bill. It might be useful to have two people operating them, so that the knowledge and the load is shared. Regards, Martin

On Oct 25, 2009, at 2:32 PM, Martin v. Löwis wrote:
I've been trying to get some feedback about firing up buildbots on Cloud Servers for a while now and haven't had much luck. I'd love to find a way of having buildbots come to life, report to the mother ship, do the build, then go away 'till next time they're required.
I'm not quite sure whom you have been trying to get feedback from, and can't quite picture your proposed setup from above description.
I posted a couple of messages on testing-in-python, and have sent around some queries to others that I know are using buildbot type setups with various tools/platforms, not necessarily Python.
In any case, ISTM that your approach isn't compatible with how buildbot works today (not sure whether you are aware of that): a build slave needs to stay connected all the time, so that the build master can trigger a build when necessary.
So if your setup requires the slaves to shut down after a build, I don't think this can possibly work.
It can't possibly work within the way the Python buildbot structure currently works, as I understand it so far. What I'm implementing is less of a 'continuous integration' tool like you would use for something like Python itself, and more of a testing tool for things that have to be installed on multiple versions of Python, on multiple platforms. I don't need to know that it works on every checkin, just every once in a while I'd like to start from scratch and make sure everything still works on all supported versions of Python on all the platforms I test on and cloud servers are great for this since I'll usually only need them for an hour or so. S

On Oct 25, 2009, at 3:35 PM, Martin v. Löwis wrote:
I don't need to know that it works on every checkin
For us, that is a fairly important requirement, though. Reports get more and more useless if they aren't instantaneous. Sometimes, people check something in just to see how the build slaves react.
Understood -- that's why I mentioned it. This is a different use-case. It may still have some use for Python itself, but my idea is more for testing things like libraries where the developer may only work on or have one platform and may want to test installing on other platforms and Python versions during development and/or before release. S

2009/10/25 "Martin v. Löwis" <martin@v.loewis.de>:
I've been trying to get some feedback about firing up buildbots on Cloud Servers for a while now and haven't had much luck. I'd love to find a way of having buildbots come to life, report to the mother ship, do the build, then go away 'till next time they're required.
I'm not quite sure whom you have been trying to get feedback from, and can't quite picture your proposed setup from above description.
Sorry, feedback was the wrong word. I've been digging round the documentation I've been able to find and looking into what's needed to set up a slave.
In any case, ISTM that your approach isn't compatible with how buildbot works today (not sure whether you are aware of that): a build slave needs to stay connected all the time, so that the build master can trigger a build when necessary.
So if your setup requires the slaves to shut down after a build, I don't think this can possibly work.
It's not so much that I *require* the slave to shut down, more that I'm not sure how well I'll be able to ensure that it's up all the time, and I'm trying to understand the implications of that. My basic impression is that it's not really going to work, unfortunately. Paul.

It's not so much that I *require* the slave to shut down, more that I'm not sure how well I'll be able to ensure that it's up all the time, and I'm trying to understand the implications of that. My basic impression is that it's not really going to work, unfortunately.
There is a significant difference between "not able to ensure that it is up all the time", and "it is down most of the time, and only up once a day for a short period of time". Regular and irregular short downtimes are no problem at all. When the slave comes back, it will pick up pending work. Longer scheduled downtimes (e.g. for vacations) are acceptable. Only turning on the slave occasionally makes it useless. Regards, Martin

ssteinerX <at> gmail.com <ssteinerx <at> gmail.com> writes:
On Oct 25, 2009, at 5:43 PM, Martin v. Löwis wrote:
Only turning on the slave occasionally makes it useless.
For certain use cases; not mine.
Let's say that for the use case we are talking here (this is python-dev), Martin's statement holds true.

On Sun, Oct 25, 2009 at 08:54:46AM +0000, Mark Dickinson wrote:
Would it be worth spending some time discussing the buildbot situation at the PyCon 2010 language summit? In the past, I've found the buildbots to be an incredibly valuable resource; especially when working with aspects of Python or C that tend to vary significantly from platform to platform (for me, this usually means floating-point, and platform math libraries, but there are surely many other things it applies to). But more recently there seem to have been some difficulties keeping a reasonable number of buildbots up and running. A secondary problem is that it can be awkward to debug some of the more obscure test failures on buildbots without having direct access to the machine. From conversations on IRC, I don't think I'm alone in wanting to find ways to make the buildbots more useful.
So the question is: how best to invest time and possibly money to improve the buildbot situation (and as a result, I hope, improve the quality of Python)? What could be done to make maintenance of build slaves easier? Or to encourage interested third parties to donate hardware and time? Are there good alternatives to Buildbot that might make a difference? What do other projects do?
These are probably the wrong questions; I'm hoping that a discussion would help produce the right questions, and possibly some answers.
[ x-posting to testing-in-python; please redirect followups to one list or the other! ] Hi Mark, a few bits of information... --- I have a set of VM machines running some "core" build archs -- Linux, Mac OS X, Win XP, Win Vista, and Win 7 -- that I am going to dedicate to this purpose. I am happy to give out remote admin access to a few people. This infrastructure is also going to increase slowly as I build up my lab's internal network. I'm giving Tarek an account on my Linux box later today to serve as a build slave for Distribute. -- More machines, and more esoteric machines, will be coming online as Snakebite unfolds. Trent Nelson (Snakepit master) has been finishing up some consulting work and is going to dedicate his time to it starting in November. This means more 64 bit, bigmem, and "weird" archs, with full login access. --- I've also been working on a buildbot alternative that I'm calling pony-build. pony-build is based on a client-push architecture in which client machines do builds and push results to the master, which then acts as a record-keeper rather than a slave driver. The result is a less coordinated but (AFAICT) much less fragile continuous integration system. I'm hoping to give a talk at PyCon on the differences, and there will be a sprint on pony-build + pyhton-dev at PyCon, regardless. The current status of pony-build is "functional but ugly inside". In particular, the data model is horrible, and the internal API needs much more fleshing out. Nonetheless, my server has been running for two months or so, and you can look at the results here, http://lyorn.idyll.org/ctb/pb-dev/ The most fully-fleshed out set of build clients is for pygr, http://lyorn.idyll.org/ctb/pb-dev/pygr/ but you can view daily build results for Python 2.7 at http://lyorn.idyll.org/ctb/pb-dev/python/ with an exhaustive list here http://lyorn.idyll.org/ctb/pb-dev/python/show_all (and why the heck am I sorting in reverse date order, anyway?!) The most interesting (?) part of pony-build right now is the client config, which I'm struggling to make simple and potentially universal enough to serve under buildbot as well: http://github.com/ctb/pony-build/blob/master/client/build-cpython (see 'commands' list). The most *exciting* part of pony-build, apart from the always-riveting spectacle of "titus rediscovering problems that buildbot solved 5 years ago", is the loose coupling of recording server to the build slaves and build reporters. My plan is to enable a simple and lightweight XML-RPC and/or REST-ish interface for querying the recording server from scripts or other Web sites. This has Brett aquiver with anticipation, I gather -- no more visual inspection of buildbot waterfall pages ;) -- If you're interested in bashing on, contributing to, or helping figure out what color the pony-build main screen should be, contact me off-list; I'm reluctant to spam up python-dev or testing-in-python. That having been said, the results of taking it and trying it out -- you can post results to my own recording server at http://lyorn.idyll.org/ctb/pb-dev/xmlrpc -- would be most welcome. Once I fix the data model, code collaboration will be much more feasible, too. --- cheers, --titus -- C. Titus Brown, ctb@msu.edu

On Sun, Oct 25, 2009 at 8:48 AM, C. Titus Brown <ctb@msu.edu> wrote:
[ x-posting to testing-in-python; please redirect followups to one list or the other! ]
Hi Mark,
a few bits of information...
---
I have a set of VM machines running some "core" build archs -- Linux, Mac OS X, Win XP, Win Vista, and Win 7 -- that I am going to dedicate to this purpose. I am happy to give out remote admin access to a few people. This infrastructure is also going to increase slowly as I build up my lab's internal network.
I'm giving Tarek an account on my Linux box later today to serve as a build slave for Distribute.
Just to add to what titus said; I'm trying to price out a decent Desktop machine with enough ram/disk/cpu to run VMWare ESXi and serve a variety of virtual machines. I had planned on (ab)using the free MSDN license Microsoft provided to get a variety of platforms up and running. The end goal (since I have excess bandwidth and cooling where I live) would be to maintain this box as a series of buildslaves python-dev would have near unlimited shell/remote desktop access to. The nice thing about this would be that once the initial cost was sunk for the machine itself, and making all the virtual machines, in theory the machine could run a set of "common" virtual machines all the time, with a set of virtual machines on standby if someone needed to debug a less common problem. Yes, this is a mini-snakebite concept. Right now the main blocker is funding and time - that and I need to spec something that doesn't sound like a jet engine ;) jesse

On 12:48 pm, ctb@msu.edu wrote:
[snip]
The most *exciting* part of pony-build, apart from the always-riveting spectacle of "titus rediscovering problems that buildbot solved 5 years ago", is the loose coupling of recording server to the build slaves and build reporters. My plan is to enable a simple and lightweight XML-RPC and/or REST-ish interface for querying the recording server from scripts or other Web sites. This has Brett aquiver with anticipation, I gather -- no more visual inspection of buildbot waterfall pages ;)
BuildBot has an XML-RPC interface. So Brett can probably do what he wants with BuildBot right now. Jean-Paul

On Sun, Oct 25, 2009 at 07:13, <exarkun@twistedmatrix.com> wrote:
On 12:48 pm, ctb@msu.edu wrote:
[snip]
The most *exciting* part of pony-build, apart from the always-riveting spectacle of "titus rediscovering problems that buildbot solved 5 years ago", is the loose coupling of recording server to the build slaves and build reporters. My plan is to enable a simple and lightweight XML-RPC and/or REST-ish interface for querying the recording server from scripts or other Web sites. This has Brett aquiver with anticipation, I gather -- no more visual inspection of buildbot waterfall pages ;)
BuildBot has an XML-RPC interface. So Brett can probably do what he wants with BuildBot right now.
Brett actually wants web hooks so pony-build will ping an App Engine web app when there is more data, ala PubSubHubbub. Or hell, just have pony-build have an Atom feed with updates and simply use PuSH. In other words I want to be told when there is an update, not have to poll to find out. -Brett

Brett actually wants web hooks so pony-build will ping an App Engine web app when there is more data, ala PubSubHubbub. Or hell, just have pony-build have an Atom feed with updates and simply use PuSH. In other words I want to be told when there is an update, not have to poll to find out.
Not sure what exactly it is that Brett wants to do, but perhaps Brett could take a look at http://www.python.org/dev/buildbot/all/atom As JP says, there is also XML-RPC (at all/xmlrpc) For a true push notifications: buildbot sends messages into an IRC channel - not sure whether an App Engine App could listen to that. Regards, Martin

On Sun, Oct 25, 2009 at 9:13 AM, <exarkun@twistedmatrix.com> wrote:
On 12:48 pm, ctb@msu.edu wrote:
[snip]
The most *exciting* part of pony-build, apart from the always-riveting spectacle of "titus rediscovering problems that buildbot solved 5 years ago", is the loose coupling of recording server to the build slaves and build reporters. My plan is to enable a simple and lightweight XML-RPC and/or REST-ish interface for querying the recording server from scripts or other Web sites. This has Brett aquiver with anticipation, I gather -- no more visual inspection of buildbot waterfall pages ;)
BuildBot has an XML-RPC interface. So Brett can probably do what he wants with BuildBot right now.
... but pony-build follows a different model ;o) -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article:

On Fri, Oct 30, 2009 at 11:42:30AM -0500, Olemis Lang wrote:
On Sun, Oct 25, 2009 at 9:13 AM, <exarkun@twistedmatrix.com> wrote:
On 12:48 pm, ctb@msu.edu wrote:
[snip]
The most *exciting* part of pony-build, apart from the always-riveting spectacle of "titus rediscovering problems that buildbot solved 5 years ago", is the loose coupling of recording server to the build slaves and build reporters. ?My plan is to enable a simple and lightweight XML-RPC and/or REST-ish interface for querying the recording server from scripts or other Web sites. ?This has Brett aquiver with anticipation, I gather -- no more visual inspection of buildbot waterfall pages ;)
BuildBot has an XML-RPC interface. ?So Brett can probably do what he wants with BuildBot right now.
... but pony-build follows a different model
I'd rather not get into discussions of why my vaporware is going to be way, way better than anything anyone else could possibly do -- that's flamebait and not very friendly, in the end. Let's just say that I'm wasting my own time on it to scratch my own itch and leave it at that! thanks, --titus -- C. Titus Brown, ctb@msu.edu

On Fri, Oct 30, 2009 at 11:45 AM, C. Titus Brown <ctb@msu.edu> wrote:
On Fri, Oct 30, 2009 at 11:42:30AM -0500, Olemis Lang wrote:
On Sun, Oct 25, 2009 at 9:13 AM, <exarkun@twistedmatrix.com> wrote:
On 12:48 pm, ctb@msu.edu wrote:
[snip]
The most *exciting* part of pony-build, apart from the always-riveting spectacle of "titus rediscovering problems that buildbot solved 5 years ago", is the loose coupling of recording server to the build slaves and build reporters. ?My plan is to enable a simple and lightweight XML-RPC and/or REST-ish interface for querying the recording server from scripts or other Web sites. ?This has Brett aquiver with anticipation, I gather -- no more visual inspection of buildbot waterfall pages ;)
BuildBot has an XML-RPC interface. ?So Brett can probably do what he wants with BuildBot right now.
... but pony-build follows a different model
that was just a brief comment to mention that even if both (buildbot + pony-build) support RPC, they are not just «the same» .
I'd rather not get into discussions of why my vaporware is going to be way, way better than anything anyone else could possibly do
+1 ... I'll be the first one that won't follow it since I have no time for that and my intention was not to suggest that «pb is better than bb» (but if you follow, please do it in a separate thread ;o) @exarkun@twistedmatrix.com
But BuildBot exists, is deployed, and can be used now, without waiting.
+1 as I mentioned before I was not talking about eliminating buildbots
(Sorry, I don't really understand what point you were hoping to make with your message, so I just thought I'd follow up in the same style and hope that you'll understand my message even if I don't understand yours :).
well, I understand that you don't understand, since I barely understand that I will never be able to understand myself ... :) The only thing I can say so far is that if pb is seriously considered as an option ... then I could give a hand (... and I'll possibly do it anyway , once I have time :-/ ) ... turning myself off ... -- Regards, Olemis. Blog ES: http://simelo-es.blogspot.com/ Blog EN: http://simelo-en.blogspot.com/ Featured article:

On 04:42 pm, olemis@gmail.com wrote:
On Sun, Oct 25, 2009 at 9:13 AM, <exarkun@twistedmatrix.com> wrote:
On 12:48 pm, ctb@msu.edu wrote:
[snip]
The most *exciting* part of pony-build, apart from the always- riveting spectacle of "titus rediscovering problems that buildbot solved 5 years ago", is the loose coupling of recording server to the build slaves and build reporters. �My plan is to enable a simple and lightweight XML-RPC and/or REST-ish interface for querying the recording server from scripts or other Web sites. �This has Brett aquiver with anticipation, I gather -- no more visual inspection of buildbot waterfall pages ;)
BuildBot has an XML-RPC interface. �So Brett can probably do what he wants with BuildBot right now.
... but pony-build follows a different model
But BuildBot exists, is deployed, and can be used now, without waiting. (Sorry, I don't really understand what point you were hoping to make with your message, so I just thought I'd follow up in the same style and hope that you'll understand my message even if I don't understand yours :). Jean-Paul

The most *exciting* part of pony-build, apart from the always-riveting spectacle of "titus rediscovering problems that buildbot solved 5 years ago", is the loose coupling of recording server to the build slaves and build reporters. My plan is to enable a simple and lightweight XML-RPC and/or REST-ish interface for querying the recording server from scripts or other Web sites. This has Brett aquiver with anticipation, I gather -- no more visual inspection of buildbot waterfall pages ;)
If that's something that people want to have, then buildbot could also provide it, of course. Do you have a spec of the interface somewhere? Regards, Martin
participants (15)
-
"Martin v. Löwis"
-
Antoine Pitrou
-
Brett Cannon
-
C. Titus Brown
-
David Bolen
-
exarkun@twistedmatrix.com
-
Glyph Lefkowitz
-
Jesse Noller
-
Mark Dickinson
-
MRAB
-
Olemis Lang
-
Paul Moore
-
R. David Murray
-
ssteinerX@gmail.com
-
Zooko O'Whielacronx