[IPython-dev] newparallel

Wed Feb 9 15:15:04 EST 2011

Hey Satra,

On Sun, Feb 6, 2011 at 12:47 PM, Satrajit Ghosh <satra at mit.edu> wrote:
>
> 3. fernando: is your workflow branch mergeable with this? the ability to
> spawn and shutdown engines as needed (especially in the context of clusters
> such as SGE) has come up for discussion on the nipype side a fair bit in the
> last few weeks.

the workflow branch was really just a trivial amount of code to show
Soizic how the basic idea should work, so there isn't much in there to
really merge.

The more important question seems to be that we need to have a good
solution for the (valid) use case of engine creation and use outside
of our central scheduler, when another scheduler is in control of job
creation (say SGE or anything else).

I've been thinking about this a bit, and there seem to be two valid
scenarios to consider:

1. nipype wants to parallelize part of a pipeline where each node is
nothing but a command-line call, long enough that startup overhead is
irrelevant and with no other information to be transferred from the
'head node' (the instance running nipype itself) to the workers.

2. startup time is relevant compared to execution (higher frequency
execution) or there's information to be passed that would be
cumbersome/impossible as command-line arguments but is available to
nipy as python objects.

In situation 1, it's probably not of much value to have ipython
around, except for the case where you might want to debug a
problematic execution. In that scenario, having the engine not
terminate its execution after a problem so that you could connect to
it and play with the data/variables to understand the problem could be
very useful.  But that could be implemented in a special debug mode,
that basically runs something like

namespace = {}
try:
  exec code in namespace
except:
  start_ipython_engine(namespace)
  engine_wait()

The engine could only run if there's a problem with the execution, and
in the normal case isn't even run.

Back to #2, I think in that case there is genuine value added by
having ipython around, as it gives nipype the ability to do much more
flexible execution control and to be efficient in scenarios where
sge/similar likely wouldn't.

The trick is for us to make it very, very easy to incorporate the
'controller' parts into nipype in a way that engines can be started by
the queuing system instead of by our own cluster scripts, and that
those engines participate of the rest of the nipype execution as
needed, coming and going as dictated by the scheduler.

Cheers,

f