[IPython-dev] Parallel programming dependencies

John Gill jgill at tokiomillennium.com
Tue Apr 29 08:57:48 EDT 2014


I do something similar, submitting tasks for Directed, acyclic graphs:


I get round the problem of passing data from one task to subsequent task by persisting the results of each task to disk -- might that work for you?   Each task runs in its own folder, but all the tasks know where to find data from previous tasks -- so it is actually more powerful than just passing in the data for the direct dependencies, you can get at the data for any task further up the dependency graph.


-----Original Message-----
From: ipython-dev-bounces at scipy.org [mailto:ipython-dev-bounces at scipy.org] On Behalf Of Andrew Jaffe
Sent: Tuesday, April 29, 2014 4:57 AM
To: ipython-dev at scipy.org
Subject: [IPython-dev] Parallel programming dependencies

Hi all,

I posted a version of this to StackOverflow at http://stackoverflow.com/questions/23290086/ipython-parallel-programming-dependencies
but there hasn't been a response, so I thought I'd try here. Apologies if this is inappropriate here.

I am using iPython for some relatively heavy numerical tasks, subsets of which are more or less embarrassingly parallel. The tasks have very simple dependencies, but I'm struggling to work out the best way to implement them. The basic problem is that the result of a previous computation must be used in the following one, and I would like to submit those tasks to the engines separately.

Basically I've got

     in0a = ....
     in0b = ....

     res1a = f1(in0a)   ## expensive, would like to run on engine 0
     res1b = f1(in0b)   ## expensive, would like to run on engine 1
     ### and same for c, d, ... on engines 2, 3, ... (mod the number of

     res2a = f2(res1a)  ### depends on res1a = f1(in0a) being computed
     res2b = f2(res1b)  ### depends on res1b = f1(in0b) being computed

I could restructure things into some f_12() functions which call f1 and
f2 in sequence, and return both outputs as a tuple (I'd like the main engine to have access to all the results) and just submit those asynchronously, or I could use a parallel map of f1 on [in0a, in0b, ...] but I would strongly prefer not to do either of those refactorings.

I could also add a `wait()` between the f1 and f2 calls, but this would wait on all of the f1 calls, even if they are different lengths, and so I could proceed with f2 calls as their dependencies become available.

So what I really want to know is how I can use view.apply_async() so that running res2a=f2(res1a) will only happen once res1a=f1(in0a) has run (and similarly for the b, c, d, ... tasks).

Basically, I want to use a blocking apply_async. With load-balancing it should be something like

     res1a = v.apply_async(f1, in0a)
     res1b = v.apply_async(f1, in0b)
     res2a = v.apply_async(f2, res1a.get())
     res2b = v.apply_async(f2, res1b.get())

But this blocks res2b from being calculated even if res1b becomes ready.

The same problems would seem to apply to a direct view manually sending the 'a' tasks to one engine, the 'b' to another, etc.

Alternately, I thought I could use lview.temp_flags() to set up the dependencies, but the necessary .get() in the apply_async still blocks.

What I think we ideally want is something which allows apply_async to take full AsyncResult objects and figure out the dependency graph from this automatically! But is there any workaround at this point? Given the requirement to send the actual result from one computation to the next
-- through the "calling" iPython process -- I'm not sure there's any way to actually set this up.



IPython-dev mailing list
IPython-dev at scipy.org

This communication and any attachments contain information which is confidential and may also be legally privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying, printing or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful. If you have received this communication in error, please return it with the title "received in error" to postmaster at tokiomillennium.com and then permanently delete the email and any attachments from your system.

E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. It is the recipient's responsibility to ensure that e-mail transmissions and any attachments are virus free. We do not accept liability for any damages or other consequences caused by information that is intercepted, corrupted, amended, lost, destroyed, arrives late or incomplete or contains viruses.

More information about the IPython-dev mailing list