[IPython-dev] ipcluster (LSF) timing (check if all engines are running)

Florian M. Wagner wagnerfl at student.ethz.ch
Tue Aug 20 09:20:55 EDT 2013


Hey MIN,

thanks for the example. The first while statement waits for the json 
file as expected, but when I start the cluster and it finds it, a zeromq 
error occurs: Too many open files (signaler.cpp:330)
Do you have an idea?

Am 19.08.2013 16:28, schrieb MinRK:
> Something like this should work:
>
> from IPython import parallel
>
> def wait_for_cluster(engines=1, **kwargs):
>     """Wait for an IPython cluster to startup and register a minimum 
> number of engines"""
>     # wait for the controller to come up
>     while True:
>         try:
>             client = parallel.Client(**kwargs)
>         except IOError:
>             print "No ipcontroller-client.json, waiting..."
>             time.sleep(10)
>         except TimeoutError:
>             print "No controller, waiting..."
>             time.sleep(10)
>     if not engines:
>         return
>     # wait for engines to register
>     print "waiting for %i engines" % engines,
>     running = len(client)
>     sys.stdout.write('.' * running)
>     while running < engines:
>         time.sleep(1)
>         previous = running
>         running = len(client)
>         sys.stdout.write('.' * (running - previous))
>
>
>
> On Mon, Aug 19, 2013 at 6:34 AM, Florian M. Wagner 
> <wagnerfl at student.ethz.ch <mailto:wagnerfl at student.ethz.ch>> wrote:
>
>     Hey all,
>
>     I am using IPython.parallel on a large cluster, where controller
>     and engines are launched via LSF. My current workflow is as follows:
>
>         #!/bin/bash
>         python pre_processing.py
>         ipcluster start --profile=cluster --n=128 > ipcluster.log 2>&1
>         sleep 120
>         python main_computation.py
>         python post_processing.py
>
>
>     I am not entirely happy with this, since the 2 minutes are not
>     always enough depending on the load of the cluster. I believe that
>     there is a much more elegant way to launch the cluster and check
>     if all the eninges are running, before proceeding with the main
>     computation. I would highly appreciate any help.
>
>     Best regards
>     Florian
>
>
>     _______________________________________________
>     IPython-dev mailing list
>     IPython-dev at scipy.org <mailto:IPython-dev at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>
>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20130820/94e04350/attachment.html>


More information about the IPython-dev mailing list