[IPython-dev] MultiKernelManager vs. ipcluster

Tue Jun 5 00:17:10 EDT 2012

On 6/4/12 10:18 PM, Brian Granger wrote:

>> Our goals are to:
>>
>> * start multiple kernels/engines very quickly (e.g., through forking a
>> master kernel)
>> * connect single kernels to single web sessions
>> * be able to transfer files back and forth between the executing
>> environment and the web browser
>> * do things as securely as is reasonable (e.g., workers are running on a
>> firewalled virtual instance, etc.)
>> * be able to use an enterprise-level web server, such as Google App Engine.
>
> Honestly I am not sure it makes sense to use IPython at all.  The
> design requirements for IPython are very different from this.  I worry
> that you are always going to be fighting the design.  ZeroMQ/PyZMQ
> makes so much of this type of thing really easy.

We fleshed out today what I think is a decent approach to accomplishing 
all of our objectives.  It uses parts of the kernel class, and probably 
will reuse the javascript kernel object and the zmq/websocket bridge for 
our initial implementation.  We're rolling our own kernel managers to 
handle security and forking.  We'll draw up a short diagram of the 
design soon.

> ipcluster is a very specialized tool for starting engines in various
> environments.  If I had your design constraints I would not try to use
> ipcluster.

That's what we decided today too.  We fleshed out two designs, ended up 
realizing they were basically equivalent, but that there was a lot of 
extra baggage in the ipcluster-based solution, and decided against it. 
Thanks for the validation of not using ipcluster.

>
> You will have to be very careful about the security aspects of this.
> You should assume that the kernel running process is entirely hostile
> and allow no traffic that originates from it to reach the tornado web
> server.

Yes, that is the assumption.  The messages the web server pulls from a 
kernel will be directly relayed to the browser client; the tornado 
server should treat all such messages as strings.

> If security is important I think you will have to roll your own.
> If security is important you will have to come up with a system level
> solution - not something that is simply a python library.

We are using virtual machines, Unix accounts, unix file permissions, and 
unix resource limits to isolate and contain the process executing 
untrusted code.  We're definitely not executing untrusted code in the 
same process, or even as the same user, as the webserver.

>
>> SPECIFIC QUESTIONS
>> ------------------
>>
>> 1. Brian, why did you decide to make a new multikernel manager instead
>> of trying to leverage the ipcluster functionality to execute multiple
>> engines?
>
> 1) ipcluster just does too much that is irrelevant to starting kernels
> for the notebook.
> 2) ipcluster doesn't do any of the more advanced things that you need
> for starting kernels across many systems.

Thanks; that was basically the conclusion we came to today.  I was 
hoping that some of the more advanced capabilities of ipcluster would 
make our job easier, but it seemed to be too much baggage and didn't 
really do what we wanted (i.e., we'd still have to roll a lot of code on 
our own).

>
>> 2. Some time ago, on this list or on a pull request, there was some
>> discussion about the difference between kernels and engines (or the lack
>> of difference).  Are there some plans to simplify the ipcluster
>> architecture a bit to merge the concepts of engines and kernels?
>
> I think Min has answered this, but the answer is now that engine = kernel.

Thanks.  I like the distinction between binding and connecting kernels.

>
>> 3. Any idea about how much overhead the cluster introduces for lots of
>> short computations involving lots of output?  We'll test this too, but
>> I'm curious if there was thought towards this use-case in the design.
>
> This one is tough to answer.  In general the overhead of
> PyZMQ/ZeroMQ/IPython is extremely low.  The overheads you will run
> into will be completely determined by things like network throughput,
> etc.

Thanks for your input!  We hope to have a demo up soon (in the next 
couple of weeks).  If the code looks interesting to you guys, we hope to 
submit a pull request for relevant parts.  We're trying to do this in a 
way that would be useful to the larger community as well.

Jason