[IPython-dev] MPI on Windows HPC

Dave Hirschfeld dave.hirschfeld at gmail.com
Wed Sep 25 11:47:16 EDT 2013


I've got MPI working on my local machine but it seems to me that if I don't 
start all engines together they don't recognise themselves as being part of 
the same group - i.e.

I start 4 engines in a terminal with `mpiexec -n 4 ipengine.bat --
mpi=mpi4py` then, after they've started I start a further two engines in  
another terminal with `mpiexec -n 2 ipengine.bat --mpi=mpi4py`

I then observe that the Client instance recognises all 6 engines, but MPI 
sees them as belonging to two distinct groups of sizes 4 & 2:

In [21]: rc.ids
Out[21]: [0, 1, 2, 3, 4, 5]

In [22]: view = rc[:]

In [23]: @view.remote(block=True)
    ...: def hello():
    ...:     from mpi4py import MPI
    ...:     comm = MPI.COMM_WORLD
    ...:     return "Process {comm.rank} of {comm.size}.".format(comm=comm)
    ...: 

In [26]: hello()
Out[26]: 
['Process 1 of 4.',
 'Process 3 of 4.',
 'Process 0 of 4.',
 'Process 2 of 4.',
 'Process 1 of 2.',
 'Process 0 of 2.']


My understanding is that I really want these to be recognised as part of the 
same group - is this possible?

My concern is that the Windows HPC scheduler can kill jobs at any time and 
restart them later. That seems to be fine for the Client which will 
recognise the newly started engines but it seems that for MPI this will lead 
me to eventually having each engine in an MPI group all of its own which 
won't be very useful AFAICT.

Unfortunately I have nearly zero MPI experience - am I missing something 
here?

Thanks,
Dave







More information about the IPython-dev mailing list