[IPython-dev] Kernel-client communication

Tue Aug 31 01:28:11 EDT 2010

On Mon, Aug 30, 2010 at 1:51 AM, Almar Klein <almar.klein at gmail.com> wrote:
> Ah right. Although I'm not sure how often one would use such this in
> practice, it's certainly a nice feature, and seems to open op a range of
> possibilities. I can imagine this requirement makes things considerably
> harder to implement, but since you're designing a whole new protocol from
> scratch, it's probably a good choice to include it now.

And the whole thing fits naturally in our design for tools that enable
both interactive/collaborative computing and distributed/parallel work
within one single framework.  After all, it's just manipulating
namespaces :)

>> In our case obviously the kernel itself remains unresponsive, but the
>> important part is that the networking doesn't suffer.  So we have
>> enough information to take action even in the face of an unresponsive
>> kernel.
>
> I'm quite a new to networking, so sorry for if this sounds stupid: Other
> than the heartbeat stuff not working, would it also have other effects? I
> mean, data can not be send or received, so would maybe network buffers
> overflow or anything?

Depending on how you implemented your networking layer, you're likely
to lose data.  And you'll need to ensure that your api recovers
gracefully from half-sent messages, unreplied messages, etc.

Getting a robust and efficient message transport layer written is not
easy work.  It takes expertise and detailed knowledge, coupled with
extensive real-world experience, to do it right.  We simply decided to
piggy back on some of the best that was out there, rather than trying
to rewrite our own.  The features we gain from zmq (it's not just the
low-level performance, it's also the simple but powerful semantics of
their various socket types, which we've baked into the very core of
our design) are well worth the price of a C dependency in this case.

> Further, am I right that the heartbeat is not necessary when communicating
> between processes on the same box using 'localhost' (since some network
> layers are bypassed)? That would give a short term solution for IEP.

Yes, on local host you can detect the process via other mechanisms.
The question is whether the system recovers gracefully from dropped
messages or incomplete connections.  You do need to engineer that into
the code itself, so that you don't lock up your client when the kernel
becomes unresponsive, for example.

I'm sure we still have corner cases in our code where we can lock up,
it's not easy to prevent all such occurrences.

> No, that's the great thing! All channels are multiplexed over the same
> socket pair. When writing a message to a channel, it is put in a queue,
> adding a small header to indicate the channel id. There is a single thread
> that sends and receives messages over the socket. It just pops the messages
> from the queue and sends them to the other side. At the receiver side, the
> messages are distributed to the queue corresponding to the right channel. So
> there's one 'global' queue on the sending side and one queue per channel on
> the receiver side.

Ah, excellent!  It seems your channels are similar to our message
types, we simply dispatch on the message type (a string) with the
appropriate handler.  The twist in ipython is that we have used as an
integral part of the design the various types of zmq sockets: req/rep
for stdin control, xrep/xreq for execution requests multiplexed across
clients, and pub/sub for side effects (things that don't fit in a
functional paradigm).

We thus have a very strong marriage between the abstractions that zmq
exposes and our design.  Honestly, I sometimes feel as if zmq had been
designed for us, because it makes certain things we'd wanted for a
very long time almost embarrassingly easy.

Thanks a lot for sharing your ideas, it's always super useful to look
at these questions from multiple perspectives.

Regards,

f