[IPython-dev] iPython cluster on multiple Windows servers, without Windows HPC Server

Jason Roberts jason.roberts at duke.edu
Tue May 6 16:51:54 EDT 2014


Thank you, MinRK. I have no problem configuring and starting the controller and engines manually. I will look into that.

 

Jason

 

From: ipython-dev-bounces at scipy.org [mailto:ipython-dev-bounces at scipy.org] On Behalf Of MinRK
Sent: Tuesday, May 06, 2014 3:37 PM
To: IPython developers list
Subject: Re: [IPython-dev] iPython cluster on multiple Windows servers, without Windows HPC Server

 

An important thing to note about ipcluster is that it’s a very complicated way to do something that’s not very complicated. All it sets out to do is:

1.      start a controller with ipcontroller 

2.      start 0-many engines with ipengine

All of the complexity comes from abstracting how processes actually start, including where machines are, batch systems, etc. But in the end, it’s just doing: 

$> ipcluster
$> for i in {1..n}; do ipengine; done

ipcluster makes some simple cases easier, but if it doesn’t do what you want, you can always start the controller and engines yourself, with no loss of functionality. Plus, a tool that only deploys a cluster on your own system is much simpler than one that tries to work in a wide variety of contexts like ipcluster. 

The basic steps in getting a cluster up and running:

1.      configure the controller to listen on an IP visible to the other machines (c.HubFactory.ip = '1.2.3.4' in ipcontroller_config.py on the controller machine. 

2.      start the controller with ipcontroller

3.      copy .ipython/profile_default/security/ipcontroller-*.json to all of the various machines on which you plan to start engines.

4.      start ipengine as many times as is appropriate on each machine. 

Step 3. is unnecessary if your systems are on a shared filesystem.

For instance, here is a simple version that starts a controller and engines with ssh on Linux or OS X machines, putting processes in the background with screen: 

$> ssh controller_host screen -dmS ipcontroller
$> for host in host1 host2; do
> scp ~/.ipython/profile_default/security/ipcontroller-*.json $host:.ipython/profile_default/security/ 
> ssh $host 'for n in {1..3}; do screen -dmS ipengine; done'
> done

Which is a lot simpler than the hundreds of lines of ipcluster, and, frankly, better behaved than the SSH launchers that ship with IPython. 

If you have Windows analogues for ‘tell machine X to run command Y,’ you can make a similar script, tailored to your use.

-MinRK

On Tue, May 6, 2014 at 11:35 AM, Jason Roberts <jason.roberts at duke.edu <mailto:jason.roberts at duke.edu> > wrote:

I have a situation where I have to use MS Windows for a big parallel processing job, due to Windows dependencies on some steps in the job. I have successfully used iPython on a single 16-processor machine for this purpose. Thank you very much for making this so easy to use! It has saved me a huge amount of time.

 

Now, if possible, I would like to set up a cluster that has multiple Windows servers (Windows Server 2008 R2 Standard). The iPython documentation (http://ipython.org/ipython-doc/dev/parallel/parallel_process.html) describes several options. The one that seems best oriented for Windows, at least under the assumption that Microsoft technologies are the best choice for Windows, is to use Microsoft HPC Pack 2008 (http://ipython.org/ipython-doc/dev/parallel/parallel_winhpc.html). I tried this. Unfortunately HPC Pack appears to require Active Directory to be deployed. My shop runs a mixture of different operating systems, and while we have LDAP, we do not have a full-blown deployment of Active Directory. This appears to rule out the HPC Pack option.

 

Are there other alternatives for running an iPython cluster composed of multiple Windows servers, and which is best? Should I look at mpiexec with Open MPI? Is there some way to do it with SSH, despite the iPython documentation saying not?

 

Thanks for any advice you can provide, and thanks again for iPython’s parallel processing infrastructure. It truly is a time saver.

 

Jason

 


_______________________________________________
IPython-dev mailing list
IPython-dev at scipy.org <mailto:IPython-dev at scipy.org> 
http://mail.scipy.org/mailman/listinfo/ipython-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140506/e9baddb3/attachment.html>


More information about the IPython-dev mailing list