[IPython-dev] taskclient clear and hierarchical parallel processing

Brian Granger ellisonbg at gmail.com
Wed May 12 14:59:53 EDT 2010


Satra,

On Wed, May 12, 2010 at 6:39 AM, Satrajit Ghosh <satra at mit.edu> wrote:
> hi brian and fernando,
>
> Issue 1
> ---------
> i create an ipython parallel cluster using ipcluster.
>
> now shell 1:
> launch python, get taskclient, run tasks with block=False
>
> now shell 2:
> launch python, get taskclient, run tasks with block=False
> get my results using get_task_result
> call taskclient.clear()
>
> this clears all the tasks in shell 1.
>
> is this the intended mode of operation?

For now it is.  The issue is that the controller stores all the tasks
in memory (eventually it should store them on disk), so clear() is
used to get rid of tasks you are done with so you can save memory.  So
yes, for now it is the desired behavior.

> alternatively, is there a way to clear a specific task based on its task id?
> basically i want to use the same pool of resources from multiple client
> connections.

We don't have a way of doing this currently.  Basically, you should
call clear() when you know all clients are done with a set of tasks
and you want to free up that memory.

> Issue 2
> ---------
> a related second question is more of a design pattern issue. i have
> hierarchically complex DAGs such that each node of a DAG can be a DAG
> itself. running this DAG in parallel leads me to the following issue.
>
> let's assume i have 2 computational nodes on which i can send two nodes of
> the DAG (Task 1 and Task 2). now suppose these tasks themselves are DAGs and
> use the same mechanism for executing nodes. now when we push a task from
> within Task1, these go into a queue and will never run (essentially wait for
> the parent tasks to release the node).
>
> now i can flatten out all the DAGs and run them, but it would be neat if
> there was a pattern that would enable running these as concrete entities.

I have not really thought about this point before.  But I guess that
you can get into a deadlock situation if all the nodes are busy and
the sub-DAGs can't be executed because their parents are using all the
engines.  Obviously if you have a larger number of engines, this
problem can be avoided, but it would be nice if the scheduler itself
could handle this.  Some questions:

* Could be re-write it so that the sub DAGs were top-level tasks
rather than doing the recursive task-submit-a-task thing?

We might have to look into ways of allowing tasks to be paused to that
sub-tasks can run while the parent is paused.  Interesting things to
think about...

Cheers,

Brian



> cheers,
>
> satra
>
>
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev
>
>



-- 
Brian E. Granger, Ph.D.
Assistant Professor of Physics
Cal Poly State University, San Luis Obispo
bgranger at calpoly.edu
ellisonbg at gmail.com



More information about the IPython-dev mailing list