[Twisted-Python] Distributed Trial Test Runner

Hello, For my Summer of Code project, I am implementing a distributed trial test runner (http://code.google.com/soc/psf/appinfo.html?csaid=770D0FCD95DAFF9E). The following is basically what needs to be done to get distributed trial working, given a master machine and slaves: 1. Transfer the relevant code to all slave machines. (with conch) 2. Do test loading at least on the master (possibly on each of the slaves as well) 3. Find a good test-to-unique-id mapping (probably using FQPNs + something else for test_suite-generated tests?) 4. Implement a visitor which will run through the tests on the master and split them among the slaves. 5. Figure out how to get SSH command shells open to the slaves and get individual tests to run 6. Gather results from slaves and give them to a reporter. I think that is all. Any thoughts? Comments? Thanks, Alex

Alex Lang wrote:
This part is a bit short on detail. "relevant code" isn't possible to determine automatically, in general. You never know what dependencies code might have. Probably this user needs to be given some control over this, e.g. a setting for the directory to copy (or rsync?). I guess the trick will be that once a slave is set up to make sure that giving it the current code for testing with is quick (copying the whole source for a large project every time would suck, especially if you want to use a slave that's on a relatively slow link). rsync is definitely tempting here! Another possible way to transfer the changes is as a branch URL, revision ID, and diff. i.e. the output of "svn info | grep ^URL:", "svn info | grep ^Revision:" and "svn diff" for SVN checkouts (but keep in mind not all projects use SVN, so if you go this route you'd want to make part extensible).
3. Find a good test-to-unique-id mapping (probably using FQPNs + something else for test_suite-generated tests?)
The scheme discussed on #twisted along these lines sounded ok to me.
I think that is all. Any thoughts? Comments?
Make sure that trial options that don't work when distributed give graceful errors, rather than tracebacks or wrong results. Options like "--debug" and "--coverage" spring to mind, but there may be others. Similarly, you should also make sure that "--until-failure" does something sensible. Basically, look at each of the options described by "trial --help" and figure out what distributed trial should do with it. This is a relatively small thing, but this sort of polish makes a big difference to the end user experience. I also agree with L. Daniel Burr that there could be some overlap with Buildbot in this project. I guess you probably weren't thinking of running a master daemon perpetually for slaves to connect to, but maybe that's not a crazy idea. Also Buildbot has already deals with reporting results (in a minimal way), updating source trees according to instructions from a master, and running commands on those trees. It also deals with managing slaves. Because running tests requires a much more elaborately and carefully configured environment than compiling a pre-processed C file, thinking about disttrial as being more like buildbot than distcc might make good sense. -Andrew.

Alex Lang wrote:
This part is a bit short on detail. "relevant code" isn't possible to determine automatically, in general. You never know what dependencies code might have. Probably this user needs to be given some control over this, e.g. a setting for the directory to copy (or rsync?). I guess the trick will be that once a slave is set up to make sure that giving it the current code for testing with is quick (copying the whole source for a large project every time would suck, especially if you want to use a slave that's on a relatively slow link). rsync is definitely tempting here! Another possible way to transfer the changes is as a branch URL, revision ID, and diff. i.e. the output of "svn info | grep ^URL:", "svn info | grep ^Revision:" and "svn diff" for SVN checkouts (but keep in mind not all projects use SVN, so if you go this route you'd want to make part extensible).
3. Find a good test-to-unique-id mapping (probably using FQPNs + something else for test_suite-generated tests?)
The scheme discussed on #twisted along these lines sounded ok to me.
I think that is all. Any thoughts? Comments?
Make sure that trial options that don't work when distributed give graceful errors, rather than tracebacks or wrong results. Options like "--debug" and "--coverage" spring to mind, but there may be others. Similarly, you should also make sure that "--until-failure" does something sensible. Basically, look at each of the options described by "trial --help" and figure out what distributed trial should do with it. This is a relatively small thing, but this sort of polish makes a big difference to the end user experience. I also agree with L. Daniel Burr that there could be some overlap with Buildbot in this project. I guess you probably weren't thinking of running a master daemon perpetually for slaves to connect to, but maybe that's not a crazy idea. Also Buildbot has already deals with reporting results (in a minimal way), updating source trees according to instructions from a master, and running commands on those trees. It also deals with managing slaves. Because running tests requires a much more elaborately and carefully configured environment than compiling a pre-processed C file, thinking about disttrial as being more like buildbot than distcc might make good sense. -Andrew.
participants (3)
-
Alex Lang
-
Andrew Bennetts
-
L. Daniel Burr