[IPython-dev] Pushing data from engine to engine in IPython.parallel

Sun Jan 19 07:17:11 EST 2014

Is there a way to push data directly from one engine to another (i.e. without going through the controller)?

I was thinking of trying to make a very simple shuffle (a la Hadoop). I don’t really know how Hadoop does it, so I made a guess and came up with the following:

%%px
from IPython.parallel import Client
c = Client()

c[:]['partition_results'] = []

for k1, v1 in map_results:
    c[hash(k1) % len(c.ids)]['partition_results'] += (k1, v1)

I’ve had some success using Client() in this way from the engines (for example:

%%px —t 0
from IPython.parallel import Client
c = Client()
c[1][‘foo’] = 1

%%px —t 1
foo

Out[1:0]: 1

But when I try the partitioning code, it hangs. This is clearly not the right way to do it. Maybe there is no right way to do it from how IPython.parallel is designed. But I figured if it can processes DAGs then it should be able to do this. I’m probably looking at it the wrong way.

Thanks in advance,
-Alessandro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140119/4234744c/attachment.html>