[IPython-dev] Do I need to write a new parallel engine launcher class?

Florian M. Wagner wagnerfl at student.ethz.ch
Wed Jan 15 02:42:05 EST 2014


Hey Theo,

I start the cluster using ipcluster and have the following function in 
case one or more engines are not starting:

    def wait_for_cluster(engines=1, maxtime=5):
         """Wait for an IPython cluster to startup and register a
    minimum number of
         engines"""
         start = time.time()
         running = False
         waitstr = "Waiting for controller..."
         sys.stdout.write(waitstr)
         while not running:
             try:
                 client = Client(profile='cluster')
                 running = True
                 sys.stdout.write(" found.\n")
             except:
                 time.sleep(5)
                 sys.stdout.write('.')
                 sys.stdout.flush()
             if time.time() - start > maxtime * 60:
                 break
         # wait for engines to register
         running = len(client)
         start = time.time()
         while running < engines:
             time.sleep(1)
             running = len(client)
             sys.stdout.write(
                 'Waiting for engines... (%d / %d) \r' %
                 (running, engines))
             sys.stdout.flush()
             if time.time() - start > maxtime * 60:
                 break
         print "\nConnected to %d engines..." % len(client.ids)
         return client

I don't know if this helps.



Am 14.01.2014 19:31, schrieb Drain, Theodore R (392P):
> I'm providing software to a group of users to make it easier for them to do simple parallel jobs.  The environment is a cluster of machines with a network storage system (shared file system).  I can use SSHEngineSetLauncher to launch a collection of engines on the cluster nodes and everything works fine but I'd like to handle the case where an engine can't be started for whatever reason.  Currently, the caller is given an error message that's hard to understand and the engine spawning stops leaving a partial set of engines running.
>
> What I'd like to provide them is a command line tool (pengines) that does something like this:
>
> host0> pengines start --profile=cluster
> Starting controller on host0
> Starting engine  0 on host1
> Starting engine  1 on host1
> Starting engine  2 on host2 - FAILED
>     See log /somepath/tolog/file
> Starting engine  3 on host3
> Starting engine  4 on host4
> Finished
>     5 engines available
>
> host0> pengines status
> Controller running on host0
> 5 engines available
>
> host0> pengines stop
> Stopping 5 engines...
>
> The problem I'm running in to is that the existing launchers dump a lot of text to the screen and the engine set launcher stops running if a single engine can't be started.  I'm looking for suggestions as to the best way to "fix" this.  I think I need to write a new launcher that's similar to SSHEngineSetLauncher which handles errors and provides much simpler output.  Any suggestions would be appreciated.
>
> Thanks,
> Ted
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140115/bf900d03/attachment.html>


More information about the IPython-dev mailing list