[IPython-dev] MultiKernelManager vs. ipcluster
Jason Grout
jason-sage at creativetrax.com
Fri Jun 1 17:07:37 EDT 2012
Hi everyone,
As mentioned yesterday, we've been exploring ways to implement the Sage
cell server using the IPython capabilities better. This message is a
call for comments, as well as a few questions about the directions of
things like ipcluster.
Our goals are to:
* start multiple kernels/engines very quickly (e.g., through forking a
master kernel)
* connect single kernels to single web sessions
* be able to transfer files back and forth between the executing
environment and the web browser
* do things as securely as is reasonable (e.g., workers are running on a
firewalled virtual instance, etc.)
* be able to use an enterprise-level web server, such as Google App Engine.
We've explored two approaches, and have somewhat-barely-working partial
proofs-of-concepts of each:
IPCLUSTER
---------
We implemented a forking engine factory which is run on the worker
computer. A cluster can start up new engines by sshing into the worker
computer and sending a message to the forking engine factory. A new
engine starts and then registers with the controller. We're thinking it
might be best to implement a new scheduler that would take messages,
check to see if the session id matches up with one of the
currently-running engines, and send the message to the appropriate
engine if so. If the message id does not match up to a currently
running engine, then the scheduler would start up a new engine (can the
scheduler do this?)
The client would basically be the zmq/socket.io bridge, translating
messages to/from the controller and the browser. I guess we could
implement one client per session, or we could implement one overall
client that would route cluster output to the appropriate browser.
MULTI FORKING KERNEL MANAGER
----------------------------
This approach stays much closer to the current IPython notebook
architecture. A MultiKernelManager starts up kernels using a
ForkingKernelManager (based on Min's gist the other day). Each kernel
sets up a connection to a specific set of websocket channels through a
tornado-based bridge. We still have to implement things like separation
of the forked kernels (on a separate ssh account somewhere) and the
tornado handler, and things like that.
THOUGHTS
--------
It seems that the multikernel manager is much lighter-weight, but we'd
have to implement a lot of the enterprise-level functionality that the
cluster already has. On the other hand, the ipcluster approach really
does more than we need, so, in a sense, we need to trim it back. We're
not asking you to make a decision for us, obviously, but it would be
valuable to hear any comments or suggestions.
SPECIFIC QUESTIONS
------------------
1. Brian, why did you decide to make a new multikernel manager instead
of trying to leverage the ipcluster functionality to execute multiple
engines?
2. Some time ago, on this list or on a pull request, there was some
discussion about the difference between kernels and engines (or the lack
of difference). Are there some plans to simplify the ipcluster
architecture a bit to merge the concepts of engines and kernels?
3. Any idea about how much overhead the cluster introduces for lots of
short computations involving lots of output? We'll test this too, but
I'm curious if there was thought towards this use-case in the design.
Thanks,
Jason
More information about the IPython-dev
mailing list