[IPython-dev] Parallel programming dependencies
Andrew Jaffe
a.h.jaffe at gmail.com
Tue Apr 29 03:57:28 EDT 2014
Hi all,
I posted a version of this to StackOverflow at
http://stackoverflow.com/questions/23290086/ipython-parallel-programming-dependencies
but there hasn't been a response, so I thought I'd try here. Apologies
if this is inappropriate here.
I am using iPython for some relatively heavy numerical tasks, subsets of
which are more or less embarrassingly parallel. The tasks have very
simple dependencies, but I'm struggling to work out the best way to
implement them. The basic problem is that the result of a previous
computation must be used in the following one, and I would like to
submit those tasks to the engines separately.
Basically I've got
in0a = ....
in0b = ....
res1a = f1(in0a) ## expensive, would like to run on engine 0
res1b = f1(in0b) ## expensive, would like to run on engine 1
### and same for c, d, ... on engines 2, 3, ... (mod the number of
engines)
res2a = f2(res1a) ### depends on res1a = f1(in0a) being computed
res2b = f2(res1b) ### depends on res1b = f1(in0b) being computed
I could restructure things into some f_12() functions which call f1 and
f2 in sequence, and return both outputs as a tuple (I'd like the main
engine to have access to all the results) and just submit those
asynchronously, or I could use a parallel map of f1 on [in0a, in0b, ...]
but I would strongly prefer not to do either of those refactorings.
I could also add a `wait()` between the f1 and f2 calls, but this would
wait on all of the f1 calls, even if they are different lengths, and so
I could proceed with f2 calls as their dependencies become available.
So what I really want to know is how I can use view.apply_async() so
that running res2a=f2(res1a) will only happen once res1a=f1(in0a) has
run (and similarly for the b, c, d, ... tasks).
Basically, I want to use a blocking apply_async. With load-balancing it
should be something like
res1a = v.apply_async(f1, in0a)
res1b = v.apply_async(f1, in0b)
res2a = v.apply_async(f2, res1a.get())
res2b = v.apply_async(f2, res1b.get())
But this blocks res2b from being calculated even if res1b becomes ready.
The same problems would seem to apply to a direct view manually sending
the 'a' tasks to one engine, the 'b' to another, etc.
Alternately, I thought I could use lview.temp_flags() to set up the
dependencies, but the necessary .get() in the apply_async still blocks.
What I think we ideally want is something which allows apply_async to
take full AsyncResult objects and figure out the dependency graph from
this automatically! But is there any workaround at this point? Given the
requirement to send the actual result from one computation to the next
-- through the "calling" iPython process -- I'm not sure there's any way
to actually set this up.
Yours,
Andrew
More information about the IPython-dev
mailing list