[IPython-dev] Pushing data from engine to engine in IPython.parallel

Alessandro Gagliardi alessandro.gagliardi at glassdoor.com
Sun Jan 19 07:17:11 EST 2014


Is there a way to push data directly from one engine to another (i.e. without going through the controller)?

I was thinking of trying to make a very simple shuffle (a la Hadoop). I don’t really know how Hadoop does it, so I made a guess and came up with the following:

%%px
from IPython.parallel import Client
c = Client()

c[:]['partition_results'] = []

for k1, v1 in map_results:
    c[hash(k1) % len(c.ids)]['partition_results'] += (k1, v1)

I’ve had some success using Client() in this way from the engines (for example:

%%px —t 0
from IPython.parallel import Client
c = Client()
c[1][‘foo’] = 1

%%px —t 1
foo

Out[1:0]: 1

But when I try the partitioning code, it hangs. This is clearly not the right way to do it. Maybe there is no right way to do it from how IPython.parallel is designed. But I figured if it can processes DAGs then it should be able to do this. I’m probably looking at it the wrong way.

Thanks in advance,
-Alessandro
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/ipython-dev/attachments/20140119/4234744c/attachment.html>


More information about the IPython-dev mailing list