[IPython-dev] Parallel map
Gael Varoquaux
gael.varoquaux at normalesup.org
Sat Mar 8 05:03:29 EST 2008
On Fri, Mar 07, 2008 at 08:42:26PM +0100, Gael Varoquaux wrote:
> On Fri, Mar 07, 2008 at 08:10:36PM +0100, Gael Varoquaux wrote:
> > I am trying to do a parallel map using ipython1. Is there a really simple
> > way to do this, or a tutorial somewhere telling me how? I can probably
> > figure it out, but I have to dig through a fair amount of
> > tutorial/doc/wiki articles/reading source code to move forward.
> > My requirement is that I want the code to be purely valid self sustained
> > Python code.
> OK, making some progress at this.
> I found out I need to create a MultiEngineClient
> rc = client.MultiEngineClient(('127.0.0.1', 10105))
> and I can use its map method.
I succeeded (I had a good night(s sleep, in between), by piggy backing
the ipcluster script. It is a bit ugly, but I post the code here for
future reference.
What made my task hard was both the fact that there is no obvious way of
creating a cluster from Python, and the fact that ipython1.kernel.api was
suppressed and that all the information I can find on the web uses
ipython1.kerenl.api.RemoteControler.
Now the irony is that I ended up not beeing able to use ipython1 for the
problem I was interested in, as the objects I wanted to send to my
parallel map where not picklable. I wrote a small hack using threading
and os.system to do the work. I suspect this is a limitation people are
going to bump into quite often. Ideas to make a workaround more or less
part of ipython1 natively would be great. In my case, the object I had to
scatter where directly imported from a module, so scattering a module
path as a string (eg 'ipython1.kernel.client.MultiEngineClient') waould
have been an option. I have no hindsight on these problems, so I don't
pretend suggesting a good solution.
Anyway, thanks for ipython1, keep the good work up, it is a difficult but
import task,
Cheers,
Gaƫl
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
"""
Provides a simple parallel map.
"""
# Piggy-back the ipcluster script to start the engines.
from ipython1.kernel.scripts import ipcluster as cluster
from ipython1.kernel.client import MultiEngineClient
from threading import Thread
from time import sleep
import sys
##############################################################################
def guess_ncpu():
""" Parses /proc/cpuinfo to guess the number of CPU on the box.
This has been tested only under Linux.
"""
ncpu = 0
cpuinfo = file('/proc/cpuinfo')
for line in cpuinfo.readlines():
if line[:10] == 'processor\t':
ncpu += 1
return ncpu
##############################################################################
# Code to start the engine and create the controller
def start_cluster(ncpu=guess_ncpu()):
""" Starts a cluster on the local computer and returns a controller
to the cluster.
"""
# We use ipcluster.main, but it takes its instructions from sys.argv,
# thus we overide it
orig_argv = sys.argv
sys.argv = ['foo', '-n', str(ncpu)]
# Starting the cluster is a blocking operation. We thus need a
# thread to do the work.
Thread(target=cluster.main).start()
# There is a sleep(3) in ipcluster
sleep(4)
sys.argv = orig_argv
return MultiEngineClient(('127.0.0.1',10105))
##############################################################################
# This code is so trivial you should really use directly the controller
# method if you are going to do anything more than running pmap once
# (keep in mind that there is an overhead of creating the cluster).
def pmap(func, seq, ncpu=guess_ncpu()):
""" Creates a cluster of ipython1 engines and runs a parallel map on
it.
"""
mec = start_cluster(ncpu=ncpu)
outseq = mec.map(func, seq)
mec.kill(controler=True)
return outseq
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
More information about the IPython-dev
mailing list