[IPython-dev] IPython MPI Hangs on Windows

Dave Hirschfeld dave.hirschfeld at gmail.com
Sun Sep 22 11:53:20 EDT 2013


The first problem I encounter is that the ipcluster command doesn't seem to 
work:
```
C:\dev\code>ipcluster start -n 4 --engines=MPIEngineSetLauncher
2013-09-22 16:27:15.398 [IPClusterStart] Using existing profile dir: 
u'C:\\Users\\dhirschfeld\\.ipython\\profile_default'
2013-09-22 16:27:15.407 [IPClusterStart] Starting ipcluster with 
[daemon=False]
2013-09-22 16:27:15.411 [IPClusterStart] Creating pid file: 
C:\Users\dhirschfeld\.ipython\profile_default\pid\ipcluster.pid
2013-09-22 16:27:15.414 [IPClusterStart] Starting Controller with 
LocalControllerLauncher
2013-09-22 16:27:16.408 [IPClusterStart] Starting 4 Engines with 
MPIEngineSetLauncher
2013-09-22 16:27:17.213 [IPClusterStart] ERROR |
            Engines shutdown early, they probably failed to connect.

            Check the engine log files for output.

            If your controller and engines are not on the same machine, you 
probably
            have to instruct the controller to listen on an interface other 
than localhost.

            You can set this by adding "--ip='*'" to your 
ControllerLauncher.controller_args.

            Be sure to read our security docs before instructing your 
controller to listen on
            a public interface.

2013-09-22 16:27:17.214 [IPClusterStart] ERROR | IPython cluster: stopping
2013-09-22 16:27:20.214 [IPClusterStart] Removing pid file: 
C:\Users\dhirschfeld\.ipython\profile_default\pid\ipcluster.pid
```

The only information in the log was:
```
2013-09-22 16:29:09.286 [IPControllerApp] Hub listening on 
tcp://127.0.0.1:43273 for registration.
2013-09-22 16:29:09.292 [IPControllerApp] Hub using DB backend: 'DictDB'
2013-09-22 16:29:09.545 [IPControllerApp] hub::created hub
2013-09-22 16:29:09.545 [IPControllerApp] writing connection info to 
C:\Users\dhirschfeld\.ipython\profile_default\security\ipcontroller-
client.json
2013-09-22 16:29:09.549 [IPControllerApp] writing connection info to 
C:\Users\dhirschfeld\.ipython\profile_default\security\ipcontroller-
engine.json
2013-09-22 16:29:09.553 [IPControllerApp] task::using Python leastload Task 
scheduler
2013-09-22 16:29:09.553 [IPControllerApp] Heartmonitor started
2013-09-22 16:29:09.569 [IPControllerApp] Creating pid file: 
C:\Users\dhirschfeld\.ipython\profile_default\pid\ipcontroller.pid
2013-09-22 16:29:10.144 [IPControllerApp] client::client 
'\x00\x8a\x95\xf9u\\\xabEb\x99\xc8\x1c\xd9\xf5Z\xe9?' requested 
u'connection_request'
2013-09-22 16:29:10.144 [IPControllerApp] client::client 
['\x00\x8a\x95\xf9u\\\xabEb\x99\xc8\x1c\xd9\xf5Z\xe9?'] connected
```

So I then attempted to use ipcontroller & ipengine.

In the ipcontroller terminal I get:
```
C:\dev\code>ipcontroller
2013-09-22 16:32:07.198 [IPControllerApp] Using existing profile dir: 
u'C:\\Users\\dhirschfeld\\.ipython\\profile_default'
2013-09-22 16:32:07.223 [IPControllerApp] Hub listening on 
tcp://127.0.0.1:56375 for registration.
2013-09-22 16:32:07.229 [IPControllerApp] Hub using DB backend: 'DictDB'
2013-09-22 16:32:07.482 [IPControllerApp] hub::created hub
2013-09-22 16:32:07.483 [IPControllerApp] writing connection info to 
C:\Users\dhirschfeld\.ipython\profile_default\security\ipcontroller-
client.json
2013-09-22 16:32:07.487 [IPControllerApp] writing connection info to 
C:\Users\dhirschfeld\.ipython\profile_default\security\ipcontroller-
engine.json
2013-09-22 16:32:07.493 [IPControllerApp] task::using Python leastload Task 
scheduler
2013-09-22 16:32:07.493 [IPControllerApp] Heartmonitor started
2013-09-22 16:32:07.512 [IPControllerApp] Creating pid file: 
C:\Users\dhirschfeld\.ipython\profile_default\pid\ipcontroller.pid
2013-09-22 16:33:14.381 [IPControllerApp] client::client '62f3f434-4659-
4152-beca-b5d62d48b73e' requested u'registration_request'
2013-09-22 16:33:14.382 [IPControllerApp] client::client '76ab6008-31a9-
4482-ab14-003e479882db' requested u'registration_request'
2013-09-22 16:33:19.493 [IPControllerApp] registration::finished registering 
engine 0:62f3f434-4659-4152-beca-b5d62d48b73e
2013-09-22 16:33:19.496 [IPControllerApp] engine::Engine Connected: 0
2013-09-22 16:33:19.500 [IPControllerApp] registration::finished registering 
engine 1:76ab6008-31a9-4482-ab14-003e479882db
2013-09-22 16:33:19.502 [IPControllerApp] engine::Engine Connected: 1
```

...whilst in the ipengine terminal I get:
```
C:\dev\code>mpiexec -n 2 ipengine --mpi=mpi4py
2013-09-22 16:33:14.270 [IPEngineApp] Using existing profile dir: 
u'C:\\Users\\dhirschfeld\\.ipython\\profile_default'
2013-09-22 16:33:14.275 [IPEngineApp] Initializing MPI:
2013-09-22 16:33:14.275 [IPEngineApp] from mpi4py import MPI as mpi
mpi.size = mpi.COMM_WORLD.Get_size()
mpi.rank = mpi.COMM_WORLD.Get_rank()

2013-09-22 16:33:14.335 [IPEngineApp] Using existing profile dir: 
u'C:\\Users\\dhirschfeld\\.ipython\\profile_default'
2013-09-22 16:33:14.341 [IPEngineApp] Initializing MPI:
2013-09-22 16:33:14.341 [IPEngineApp] from mpi4py import MPI as mpi
mpi.size = mpi.COMM_WORLD.Get_size()
mpi.rank = mpi.COMM_WORLD.Get_rank()

2013-09-22 16:33:14.358 [IPEngineApp] Loading url_file 
u'C:\\Users\\dhirschfeld\\.ipython\\profile_default\\security\\ipcontroller-
engine.json'
2013-09-22 16:33:14.358 [IPEngineApp] Loading url_file 
u'C:\\Users\\dhirschfeld\\.ipython\\profile_default\\security\\ipcontroller-
engine.json'
2013-09-22 16:33:14.374 [IPEngineApp] Registering with controller at 
tcp://127.0.0.1:56375
2013-09-22 16:33:14.374 [IPEngineApp] Registering with controller at 
tcp://127.0.0.1:56375
2013-09-22 16:33:14.505 [IPEngineApp] Starting to monitor the heartbeat 
signal from the hub every 3010 ms.
2013-09-22 16:33:14.510 [IPEngineApp] Using existing profile dir: 
u'C:\\Users\\dhirschfeld\\.ipython\\profile_default'
2013-09-22 16:33:14.512 [IPEngineApp] Completed registration with id 1
2013-09-22 16:33:14.517 [IPEngineApp] Starting to monitor the heartbeat 
signal from the hub every 3010 ms.
2013-09-22 16:33:14.525 [IPEngineApp] Using existing profile dir: 
u'C:\\Users\\dhirschfeld\\.ipython\\profile_default'
2013-09-22 16:33:14.526 [IPEngineApp] Completed registration with id 0
```


So everything seems to have worked. However whenever I try to connect to it 
from a Client it hangs:
```
In [1]: from IPython.parallel import Client
   ...: rc = Client()
   ...: 

In [2]: rc.ids
Out[2]: [0, 1]

In [3]: rc[:].push({'a': 1})
Out[3]: <AsyncResult: _push>

In [4]: _3.result()  <-------------------- Hang!!!
``` 

In [3]: print sys_info()
{'codename': 'Work in Progress',
 'commit_hash': '62e35db',
 'commit_source': 'repository',
 'default_encoding': 'cp1252',
 'ipython_path': 'c:\\dev\\code\\ipython\\IPython',
 'ipython_version': '2.0.0-dev',
 'os_name': 'nt',
 'platform': 'Windows-7-6.1.7601-SP1',
 'sys_executable': 'C:\\dev\\bin\\Anaconda\\python.exe',
 'sys_platform': 'win32',
 'sys_version': '2.7.5 |Anaconda 1.7.0 (64-bit)| (default, Jul  1 2013, 
12:37:52) [MSC v.1500 64 bit (AMD64)]'}

I get this both with my self-compiled mpi4py linked to msmpi and also with 
Christoph Gohlke's version linked to OpenMPI 1.6.5.

mpi4py itself seems to be working fine and passes all the tests except 2 in 
Christoph's version and 4 in my version.

https://groups.google.com/forum/#!topic/mpi4py/arQc9fVhZAo

Let me know if there's anything I can do to help get to the bottom of the 
problem...

Thanks,
Dave







More information about the IPython-dev mailing list