[IPython-dev] From the sage list: thinking about our messaging protocol

Wed Dec 14 17:17:40 EST 2011

Hello,

On Wed, Dec 14, 2011 at 12:57, Fernando Perez <fperez.net at gmail.com> wrote:
> Hi folks,
>
> this conversation started on the sage-dev list, but I'm moving part of
> it over here b/c it's really about our messaging protocol...
>
> It would be great if ultimately we have a set of protocol tests that
> validate our spec, as others are beginning to rely on the spec and
> it's pretty easy to drift if we have no tests telling us what is meant
> to stay.  That doesn't mean we can never tweak the spec as we better
> understand certain cases, but we'll do it explicitly and not by
> accidental code drift.
>
> Cheers,
>
> f
> ---------- Forwarded message ----------
> From: Jason Grout <jason-sage at creativetrax.com>
> Date: Wed, Dec 14, 2011 at 12:43 PM
> Subject: [sage-devel] Re: Questions about the single-cell server
> To: sage-devel at googlegroups.com
>
>
> On 12/14/11 1:57 PM, Fernando Perez wrote:
>>
>> At this point we have a spec in the document you
>> pointed to, but precious little in the way of independent compliance
>> testing, so there's a real danger of the actual implementation
>> diverging from the specification simply by accident.
>
>
>
> Already I see it's changed from what we implemented.  For example,
> your header now contains the msg_type, and the top-level contains the
> msg_id, duplicating both pieces of information.  To me, it makes sense
> to keep things the way they were (msg_id in header, msg_type at the
> top-level), and don't duplicate the information.  This minimizes the
> information that has to go back and forth (no need to send back the
> msg_type in the parent_header, since the client could just store it,
> though I suppose the argument could be made that msg_type really does
> belong in the header).

These changes were made because of IPython.parallel, specifically the
need to view headers *without* unserializing the potentially large
body of the message.  This means that the message must not be sent as
a single blob, but as a multipart message, and the
[header,parent_header, content] should be a complete representation of
the message. Nothing should be in the top-level except for copies for
convenience, because it is not required to be sent over the network.
Since they are strictly copies of the contents of the header (as the
spec defines them), top-level keys should be considered read-only, as
changing them directly would violate the protocol.

The header is really 'metadata about the message', and any/all
information about the message should go in there.  It seems
inconsistent for the header to be an incomplete description of the
message, which would be true if msg_id and/or msg_type were absent.

>
> What if the msg_id or msg_type between the header and the top-level is
> inconsistent?  Which takes precedence? I presume that the header
> information takes precedence, but since the top-level fields are for
> convenience, my guess is that in practice, the top-level fields take
> precedence, since if you have to compare the top-level fields with the
> header fields, you might as well just use the header fields.

There is no priority, because there is no non-pathological way for
these two values to differ.  The top-level keys are *defined* as
convenient aliases to the header keys.  If they differ, then the
protocol has not been implemented, and it would be appropriate to
either raise an error, or set the top-level key from the header on
recv (IPython does the latter, because top-level aliases are not sent
over the network).

In IPython, the Session object, which is responsible for building
messages and sending them over the wire, copies the msg_type and
msg_id to the top-level from the header purely for convenience of
other code. The top-level copies are never sent over the network, so
any changes to them, or indeed any extra keys injected into the
message, would not be reflected on the receiving side.

We did briefly have a 'cleaned up' version after we cut 0.11, where
the top-level duplicates were removed, but found that the result was
actually much messier, given how frequently these values are accessed.

>
> A very simple wrapper function on the receiving end can add the
> convenience fields if they really need to be at the top-level, and
> that would guarantee consistency.

That's exactly what we do.  In IPython, *all* communication is managed
through Session objects, which make certain guarantees:

* A message has a header, parent_header, and content.
* The 'msg_id' and 'msg_type' keys of the header are copied to the
top-level for convenience.

This Session object also handles serialization, authentication, and
the use of buffers and pyzmq MessageTracker objects used in
IPython.parallel for non-copying operations, all of which are
*extensions* of the messaging spec.  This is what enables us to switch
between json, pickle, msgpack, and protobuf for serialization with a
line or two of config.

-MinRK

>
> Feel free to CC this message over to the ipython list if you want to
> take up the discussion there.
>
> Thanks,
>
> Jason
>
>
> --
> To post to this group, send an email to sage-devel at googlegroups.com
> To unsubscribe from this group, send an email to
> sage-devel+unsubscribe at googlegroups.com
> For more options, visit this group at http://groups.google.com/group/sage-devel
> URL: http://www.sagemath.org
> _______________________________________________
> IPython-dev mailing list
> IPython-dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/ipython-dev