[IPython-dev] newparallel

Wed Feb 9 16:56:15 EST 2011

On Wed, Feb 9, 2011 at 12:15, Fernando Perez <fperez.net at gmail.com> wrote:

> Hey Satra,
>
> On Sun, Feb 6, 2011 at 12:47 PM, Satrajit Ghosh <satra at mit.edu> wrote:
> >
> > 3. fernando: is your workflow branch mergeable with this? the ability to
> > spawn and shutdown engines as needed (especially in the context of
> clusters
> > such as SGE) has come up for discussion on the nipype side a fair bit in
> the
> > last few weeks.
>
> the workflow branch was really just a trivial amount of code to show
> Soizic how the basic idea should work, so there isn't much in there to
> really merge.
>

That may be, but I did merge and update it anyway ;)

The newparallel examples are now in docs/examples/newparallel

>
> The more important question seems to be that we need to have a good
> solution for the (valid) use case of engine creation and use outside
> of our central scheduler, when another scheduler is in control of job
> creation (say SGE or anything else).
>

Since starting engines amounts to running shell scripts, it's not difficult
to start engines with any *ix job system.

I am in the process of updating the Launchers from the kernel code, in order
to make this trivial on common systems.

Linking engine startup to jobs can probably (but not so nicely) be done with
the current setup in 3 steps:

1. submit IPython task that will start an engine (or just run this in the
client)

def newjob(jobID):
    jobfile="job-%s.sh"%jobID
    with open(jobfile, 'w') as f:
        f.write(<pbs boilerplate>)
        f.write("ipenginez -c jobID=%r"%jobID)
    os.system('qsub %s'%jobfile)

c[0].apply(newjob, my_jobid)

2. submit the real Task, with a dependency that will keep it on that engine

def check_jobid(jobid):
    engine_jobid = globals().get('jobID', None)
    return engine_jobid == jobid

@depend(check_jobid, my_jobid)
def mytask(args):
    dostuff()

ar = client.apply(mytask, myargs)

3. submit another task using a `follow` dependency that calls sys.exit

client.apply(lambda : sys.exit(), follow=ar)

>
> I've been thinking about this a bit, and there seem to be two valid
> scenarios to consider:
>
> 1. nipype wants to parallelize part of a pipeline where each node is
> nothing but a command-line call, long enough that startup overhead is
> irrelevant and with no other information to be transferred from the
> 'head node' (the instance running nipype itself) to the workers.
>
> 2. startup time is relevant compared to execution (higher frequency
> execution) or there's information to be passed that would be
> cumbersome/impossible as command-line arguments but is available to
> nipy as python objects.
>
>
> In situation 1, it's probably not of much value to have ipython
> around, except for the case where you might want to debug a
> problematic execution. In that scenario, having the engine not
> terminate its execution after a problem so that you could connect to
> it and play with the data/variables to understand the problem could be
> very useful.  But that could be implemented in a special debug mode,
> that basically runs something like
>
> namespace = {}
> try:
>  exec code in namespace
> except:
>  start_ipython_engine(namespace)
>  engine_wait()
>
> The engine could only run if there's a problem with the execution, and
> in the normal case isn't even run.
>
>
> Back to #2, I think in that case there is genuine value added by
> having ipython around, as it gives nipype the ability to do much more
> flexible execution control and to be efficient in scenarios where
> sge/similar likely wouldn't.
>
> The trick is for us to make it very, very easy to incorporate the
> 'controller' parts into nipype in a way that engines can be started by
> the queuing system instead of by our own cluster scripts, and that
> those engines participate of the rest of the nipype execution as
> needed, coming and going as dictated by the scheduler.
>

It sounds like what is requested is, rather than an IPython scheduler, an
SGE scheduler.  That is, the Scheduler, for every Task, starts a new Engine
(via qsub,etc.), runs the task on it, and shuts the engine down. This would
obviously be extraordinarily slow compared to the regular Scheduler.
 Perhaps the new Cluster model Brian and I have been discussing will help
with this - it will allow clients to make requests like 'start_engine' and
'stop_engine', which would be executed via the Launchers.

-MinRK

>
>
> Cheers,
>
> f
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20110209/3238995b/attachment.html>