[IPython-dev] MPI on Windows HPC
Dave Hirschfeld
dave.hirschfeld at gmail.com
Wed Sep 25 11:47:16 EDT 2013
I've got MPI working on my local machine but it seems to me that if I don't
start all engines together they don't recognise themselves as being part of
the same group - i.e.
I start 4 engines in a terminal with `mpiexec -n 4 ipengine.bat --
mpi=mpi4py` then, after they've started I start a further two engines in
another terminal with `mpiexec -n 2 ipengine.bat --mpi=mpi4py`
I then observe that the Client instance recognises all 6 engines, but MPI
sees them as belonging to two distinct groups of sizes 4 & 2:
In [21]: rc.ids
Out[21]: [0, 1, 2, 3, 4, 5]
In [22]: view = rc[:]
In [23]: @view.remote(block=True)
...: def hello():
...: from mpi4py import MPI
...: comm = MPI.COMM_WORLD
...: return "Process {comm.rank} of {comm.size}.".format(comm=comm)
...:
In [26]: hello()
Out[26]:
['Process 1 of 4.',
'Process 3 of 4.',
'Process 0 of 4.',
'Process 2 of 4.',
'Process 1 of 2.',
'Process 0 of 2.']
My understanding is that I really want these to be recognised as part of the
same group - is this possible?
My concern is that the Windows HPC scheduler can kill jobs at any time and
restart them later. That seems to be fine for the Client which will
recognise the newly started engines but it seems that for MPI this will lead
me to eventually having each engine in an MPI group all of its own which
won't be very useful AFAICT.
Unfortunately I have nearly zero MPI experience - am I missing something
here?
Thanks,
Dave
More information about the IPython-dev
mailing list