[Twisted-Python] Using Twisted for distributed computation / experiment running?
Hello all, I'm looking to use Twisted for distributing computation over a small number (~10) of PCs. I'm wondering if anyone else has some experience with this -- particularly if there is already a solution out there that I can use, so that I'm not reinventing the wheel. Here's a rough outline of what I'd like: Setup phase: given a config file containing a list of machines and the # of CPUs on each machine, update the source code on each machine and start an appropriate number of experiment runners. Run phase: a "master" process assigns an experiment to each runner. When we get a result back, log the result to a file and send a new experiment to that runner. Repeat until all experiments are done. Here are my constraints: 1) The high-level code (at least) is all in Python, so the experiment runners can collect their results by just calling Python functions. 2) I can set up ssh keys on each machine such that logging in remotely can happen without a password. 3) I don't really have to worry about authentication: I can assume that all machines are either on a non-internet-connected LAN or that firewall rules are set up so that the ports aren't accessible except from the "master" machine. 4) I need to be able to add and remove compute nodes at runtime, so I need some sort of admin shell. However, I can wait for currently-processing experiments to finish, so I don't have to worry about the complexity of restarting experiments or migrating them to other machines. 5) It'd be nice (but not required) if the experiment runners could all log some critical messages to the master process. This seems like it would only take a few hours to implement in Twisted (probably with PB), but I wanted to make sure I'm not reinventing the wheel, because it seems likely that someone has done this before.
robomancer <robomancer@gmail.com> writes:
I'm looking to use Twisted for distributing computation over a small number (~10) of PCs. I'm wondering if anyone else has some experience with this -- particularly if there is already a solution out there that I can use, so that I'm not reinventing the wheel.
[OT: non-twisted discussion follows] It seems you are wanting a hybrid between a batch queueing system and a data management system. Any reason not to use something like Torque (nee' OpenPBS) for the batch part? http://www.clusterresources.com/pages/products/torque-resource-manager.php Or, if your nodes are also interactively used (ie, workstations by day, batch nodes by night) you might look at condor: http://www.cs.wisc.edu/condor/ Both are free-ish. More info on what your data is like is probably needed for ideas on the second part. -Brett.
It seems you are wanting a hybrid between a batch queueing system and a data management system.
Any reason not to use something like Torque (nee' OpenPBS) for the batch part?
http://www.clusterresources.com/pages/products/torque-resource-manager.php
Thanks for the references. Torque seems like a good possibility, but a bit heavyweight for my tastes; I'd rather have something small and flexible that I can easily edit to suit my needs. That's why I'd prefer a solution in Python (whether or not it involves Twisted). I really am looking only at small-scale stuff -- I have no need for fault tolerance or scalability beyond maybe 5-10 nodes.
More info on what your data is like is probably needed for ideas on the second part.
All I need is the ability to test several different algorithms on several different input files. Each algorithm has a variety of parameter settings, so for every experiment I need to record which algorithm was used, the parameters, which input file was used, and the quality of the result (from -1 to 1). I don't think a data management system is necessary here; I was basically planning on using .csv files to store experiment settings and results. Again, I'm aiming for lightweight, not enterprise-grade :)
On 4/3/07, robomancer <robomancer@gmail.com> wrote:
I'm looking to use Twisted for distributing computation over a small number (~10) of PCs. I'm wondering if anyone else has some experience with this -- particularly if there is already a solution out there that I can use, so that I'm not reinventing the wheel. Here's a rough outline of what I'd like:
[...] http://ipython.scipy.org/moin/Parallel_Computing If you give us until next week, things will be cleaner. We're in the middle of transitioning from our first dev branch ('chainsaw') into the one that will become the stable development line ('saw'). Both can be checked out, but saw will, in a few days, be "released" for regular work (albeit still considered to be a development system). It's all Twisted-based, and help/contributions from other devs will be obviously welcome. Cheers, f
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Fernando, On 3 Apr 2007, at 17:48, Fernando Perez wrote:
Are you using the PB, a custom protocol or a combination of the two to make your remote calls? Regards, Matt m a t t h e w g l u b b ________________________________________________________________________ Z Group PLC Tel: +44 (0) 8700 111 173 Fax: +44 (0) 8707 051 393 Txt: +44 (0) 7800 140 877 Web: <http://www.zgroupplc.com/> This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. The opinions expressed in this mail are those of the author and do not necessarily represent the views of the company. If you have received this email in error please notify <service@zgroupplc.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (Darwin) iD8DBQFGEowKyI6MkdKPngkRAhsQAJ93yKqsTiKgw1VGMmlnFeOuGnQpKQCgmgjk WBkKbP1esCZxV6JwK0o988M= =wH2L -----END PGP SIGNATURE-----
Are you using the PB, a custom protocol or a combination of the two to make your remote calls?
Our older version (called chainsaw) uses our own custom protocol by default. The newer version uses both PB and xmlrpc/rawhttp in different places. But, we have been very careful to design everything using interfaces and adapters - thus all the network protocols can be replaced/swapped out for new ones by: 1) writing a few adapter classes that adapt a given protocol to our interaces 2) changing a single line in a config file to have the new protocol used. With that said, we are moving more in the raw http direction because it is so good at streaming large things around. PB is nice but not good at that. Also, using http allows us to develop nice browser based apps that use all this stuff.
Regards,
Matt
m a t t h e w g l u b b
________________________________________________________________________ Z Group PLC
Tel: +44 (0) 8700 111 173 Fax: +44 (0) 8707 051 393 Txt: +44 (0) 7800 140 877 Web: <http://www.zgroupplc.com/>
This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. The opinions expressed in this mail are those of the author and do not necessarily represent the views of the company. If you have received this email in error please notify <service@zgroupplc.com>
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (Darwin)
iD8DBQFGEowKyI6MkdKPngkRAhsQAJ93yKqsTiKgw1VGMmlnFeOuGnQpKQCgmgjk WBkKbP1esCZxV6JwK0o988M= =wH2L -----END PGP SIGNATURE-----
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
http://ipython.scipy.org/moin/Parallel_Computing
If you give us until next week, things will be cleaner. We're in the middle of transitioning from our first dev branch ('chainsaw') into the one that will become the stable development line ('saw'). Both can be checked out, but saw will, in a few days, be "released" for regular work (albeit still considered to be a development system).
It's all Twisted-based, and help/contributions from other devs will be obviously welcome.
Thanks! This looks really promising. Is there a place I can sign up to be notified when saw is ready?
Thanks! This looks really promising. Is there a place I can sign up to be notified when saw is ready?
Sure, we will announce saw on both the ipython-users and ipython-dev lists: http://projects.scipy.org/mailman/listinfo/ipython-user http://projects.scipy.org/mailman/listinfo/ipython-dev In the meantime, the best source of info about saw is the talk that I gave at pycon: http://ipython.scipy.org/talks/0702_pycon/ipython1/ Cheers, Brian
_______________________________________________ Twisted-Python mailing list Twisted-Python@twistedmatrix.com http://twistedmatrix.com/cgi-bin/mailman/listinfo/twisted-python
On Apr 3, 2007, at 12:13 PM, robomancer wrote:
Run phase: a "master" process assigns an experiment to each runner. When we get a result back, log the result to a file and send a new experiment to that runner. Repeat until all experiments are done.
As something quick & dirty... couldn't you just : a) have a postgres db on 1 machine and run a master on that. b) run slave nodes on all the other machines in reactor loops c) master installs commands / file data into postgres d) children poll postgres for commands, execute & log to pg as necessary its not elegant at all, but you could do that really really fast. you don't have to worry about nodes talking to one another - and they can be specifically assigned tasks. // Jonathan Vanasco | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | SyndiClick.com | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | FindMeOn.com - The cure for Multiple Web Personality Disorder | Web Identity Management and 3D Social Networking | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - | RoadSound.com - Tools For Bands, Stuff For Fans | Collaborative Online Management And Syndication Tools | - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
participants (6)
-
Brett Viren
-
Brian Granger
-
Fernando Perez
-
Jonathan Vanasco
-
Matthew Glubb
-
robomancer