[IPython-dev] Parallel programming dependencies

Matthias Bussonnier bussonniermatthias at gmail.com
Tue Apr 29 13:48:04 EDT 2014


Hi, 

Not a parallel user myself, 
but there is the following in ipython/exaple/parallel/dependencies.py

...
def getpid():
    import os
    return os.getpid()

pid0 = client[0].apply_sync(getpid)

# this will depend on the pid being that of target 0:
@depend(checkpid, pid0)
def getpid2():
    import os
    return os.getpid()
...

and lots of stuff that look *a lot* like what you are trying to do.

-- 
M

Le 29 avr. 2014 à 17:29, Andrew Jaffe a écrit :

> Hi,
> 
> On 29/04/2014 13:57, John Gill wrote:
>> Hi,
>> 
>> I do something similar, submitting tasks for Directed, acyclic graphs:
>> 
>> http://ipython.org/ipython-doc/dev/parallel/dag_dependencies.html
>> 
>> I get round the problem of passing data from one task to subsequent task by persisting the results of each task to disk -- might that work for you?   Each task runs in its own folder, but all the tasks know where to find data from previous tasks -- so it is actually more powerful than just passing in the data for the direct dependencies, you can get at the data for any task further up the dependency graph.
> 
> Certainly something along these lines would work -- indeed there are 
> many possible workarounds.
> 
> But it's still irksome that the parallel-apply model *can't* explicitly 
> pass data forward to the following tasks. This seems like an obvious 
> request...
> 
> Andrew
> 
> 
> 
> 
>> 
>> John
> 
>> 
>> -----Original Message-----
>> From: ipython-dev-bounces at scipy.org [mailto:ipython-dev-bounces at scipy.org] On Behalf Of Andrew Jaffe
>> Sent: Tuesday, April 29, 2014 4:57 AM
>> To: ipython-dev at scipy.org
>> Subject: [IPython-dev] Parallel programming dependencies
>> 
>> 
>> Hi all,
>> 
>> I posted a version of this to StackOverflow at http://stackoverflow.com/questions/23290086/ipython-parallel-programming-dependencies
>> but there hasn't been a response, so I thought I'd try here. Apologies if this is inappropriate here.
>> 
>> I am using iPython for some relatively heavy numerical tasks, subsets of which are more or less embarrassingly parallel. The tasks have very simple dependencies, but I'm struggling to work out the best way to implement them. The basic problem is that the result of a previous computation must be used in the following one, and I would like to submit those tasks to the engines separately.
>> 
>> Basically I've got
>> 
>>      in0a = ....
>>      in0b = ....
>> 
>>      res1a = f1(in0a)   ## expensive, would like to run on engine 0
>>      res1b = f1(in0b)   ## expensive, would like to run on engine 1
>>      ### and same for c, d, ... on engines 2, 3, ... (mod the number of
>> engines)
>> 
>>      res2a = f2(res1a)  ### depends on res1a = f1(in0a) being computed
>>      res2b = f2(res1b)  ### depends on res1b = f1(in0b) being computed
>> 
>> I could restructure things into some f_12() functions which call f1 and
>> f2 in sequence, and return both outputs as a tuple (I'd like the main engine to have access to all the results) and just submit those asynchronously, or I could use a parallel map of f1 on [in0a, in0b, ...] but I would strongly prefer not to do either of those refactorings.
>> 
>> I could also add a `wait()` between the f1 and f2 calls, but this would wait on all of the f1 calls, even if they are different lengths, and so I could proceed with f2 calls as their dependencies become available.
>> 
>> So what I really want to know is how I can use view.apply_async() so that running res2a=f2(res1a) will only happen once res1a=f1(in0a) has run (and similarly for the b, c, d, ... tasks).
>> 
>> Basically, I want to use a blocking apply_async. With load-balancing it should be something like
>> 
>>      res1a = v.apply_async(f1, in0a)
>>      res1b = v.apply_async(f1, in0b)
>>      res2a = v.apply_async(f2, res1a.get())
>>      res2b = v.apply_async(f2, res1b.get())
>> 
>> But this blocks res2b from being calculated even if res1b becomes ready.
>> 
>> The same problems would seem to apply to a direct view manually sending the 'a' tasks to one engine, the 'b' to another, etc.
>> 
>> Alternately, I thought I could use lview.temp_flags() to set up the dependencies, but the necessary .get() in the apply_async still blocks.
>> 
>> What I think we ideally want is something which allows apply_async to take full AsyncResult objects and figure out the dependency graph from this automatically! But is there any workaround at this point? Given the requirement to send the actual result from one computation to the next
>> -- through the "calling" iPython process -- I'm not sure there's any way to actually set this up.
>> 
>> Yours,
>> 
>> Andrew
>> 
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>> 
>> This communication and any attachments contain information which is confidential and may also be legally privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s) please note that any form of disclosure, distribution, copying, printing or use of this communication or the information in it or in any attachments is strictly prohibited and may be unlawful. If you have received this communication in error, please return it with the title "received in error" to postmaster at tokiomillennium.com and then permanently delete the email and any attachments from your system.
>> 
>> E-mail communications cannot be guaranteed to be secure or error free, as information could be intercepted, corrupted, amended, lost, destroyed, arrive late or incomplete, or contain viruses. It is the recipient's responsibility to ensure that e-mail transmissions and any attachments are virus free. We do not accept liability for any damages or other consequences caused by information that is intercepted, corrupted, amended, lost, destroyed, arrives late or incomplete or contains viruses.
>> ******************************************
>> 
> 
> 
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev




More information about the IPython-dev mailing list