[IPython-dev] Parallel programming dependencies

Wed Apr 30 03:58:49 EDT 2014

Dear all,

Thanks for the many workarounds! In fact I've got it working slightly 
less than perfectly efficiently by just using a wait between all the f1 
calls and all the f2 calls, etc. This works as long as the length of 
each task is the same (it is, for this application) and they divide the 
number of engines (not always true, but only a small waste).

On 30/04/2014 01:32, Cory Dolphin wrote:
> Andrea, one solution is to poll from the results of the two async operations:
>
> e.g.
> f1_queue = [apply_async(f1, args) for args in ...]
> f2_queue = []
>
> while True:
>      check if any of the f1 calls are done
>         if so, apply_async f2, and add the result to f2 queue
>      check if f2 queue is not empty, and if not empty, are any elements
> done?
>         if so, process result
>      check if f1 and f2 are both empty
>         exit
>
> Not to derail the conversation, but what would you imagine the 'best case' scenario for support would look like?
>
> What would you like to see IPython.parallel provide? I could imagine AsyncResults having more support for future-like operations, e.g.  chaining function calls.
>
> E.G.,
> combined_async_result = apply_async(f1,args).then(lambda res:
> apply_async(f2,res))
>
> I wonder if something like this is possible to implement?

This would be one possible implementation.

However, I think the best possible one would be a version of apply_async 
that allowed taking AsyncResult objects as parameters in exactly the 
place where the result.get() would go in the original apply.

      res1a = v.apply_async(f1,in0a)
      res1b = v.apply_async(f1,in0b)

      res2a = v.apply_async(f2,res1a)  ### depends on res1a
      res2b = v.apply_async(f2,res1b)  ### depends on res1b

You could then scan through the *args, look for any AsyncResults, and 
set up appropriate dependencies based on them. There is no clash with 
the current apply_async since AsyncResults *cannot* appear in the 
argument lists (since they are unpicklable I believe).

This would be ideal since it is exactly the calling sequence you would 
use in a non-parallel application (just change v.apply_async to python 
apply), which makes it easy to understand and also refactor for 
non-parallel applications.

Yours,

Andrew

> On Tue, Apr 29, 2014 at 8:20 PM, MinRK <benjaminrk at gmail.com
> <mailto:benjaminrk at gmail.com>> wrote:
>
>     Passing output from one task to the input of another is not well
>     supported. As others have said, one approach is to persist the
>     results to the filesystem.
>     Another, assuming it’s okay to restrict dependent tasks to run on
>     the same engine as the dependency, is to persist the values in the
>     engine’s namespace and use parallel.Reference to get it as an
>     argument to subsequent tasks.
>
>     Here’s an example <http://nbviewer.ipython.org/gist/minrk/11415238>
>     of doing this.
>
>     -MinRK
>
>
>
>     On Tue, Apr 29, 2014 at 4:25 PM, Andrea Zonca <zonca at sdsc.edu
>     <mailto:zonca at sdsc.edu>> wrote:
>
>         Hi,
>
>         On Tue, Apr 29, 2014 at 3:56 PM, Andrew Jaffe
>         <a.h.jaffe at gmail.com <mailto:a.h.jaffe at gmail.com>> wrote:
>          > the particular difference is passing the output of the
>          > earlier tasks to the later ones -- this is a use-case that is
>         very
>          > specifically not addressed in those examples or the docs --
>         each of the
>          > view.apply calls there just use no arguments, where I want to
>         use the
>          > value calculated by a previous view.apply.
>
>         Not sure I understand your application,
>         but isn't it an option to write the results to a shared filesystem
>         from the early task and read it from the late task?
>         This would work for tasks running on different nodes.
>