Note that in the parallel code, I do exactly what you mention. <div><meta charset="utf-8">I added the buffers argument to session.send(), because it is critically important for the parallel code to be able to send things like numpy arrays without ever serializing or copying the raw data, and currently, I can do that - there are zero in-memory copies of array data (even from Python->C zmq); only over the network. It also allows me to pickle arbitrary objects, and send them without having to ever copy the pickled string. Metadata is sent via json, and on the back is a series of buffers containing any binary data. I imagine that my Session object will be merged with the existing Session object once the Parallel code gets pulled into trunk, but that's a little while off.</div>
<div><br></div><div>Perhaps with the payload system, it would make sense for the kernel to use this new model. Of course, it isn't perfectly universal, as web frontends require mime-type header info in order to interpret binary data, so you would probably fracture the portability of pure JSON, but I'm not sure. Maybe the HTML header info can be in the JSON metadata in such a way that a javascript side would be able to properly interpret the data.</div>
<div><div><br></div><div>-MinRK</div><div><br></div><div><div class="gmail_quote">On Tue, Oct 19, 2010 at 16:17, Robert Kern <span dir="ltr"><<a href="mailto:robert.kern@gmail.com">robert.kern@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div class="im">On 2010-10-19 17:34 , MinRK wrote:<br>
><br>
><br>
> On Tue, Oct 19, 2010 at 15:05, Fernando Perez <<a href="http://fperez.net" target="_blank">fperez.net</a><br>
</div><div class="im">> <<a href="http://fperez.net" target="_blank">http://fperez.net</a>>@<a href="http://gmail.com" target="_blank">gmail.com</a> <<a href="http://gmail.com" target="_blank">http://gmail.com</a>>> wrote:<br>
><br>
> On Sun, Oct 17, 2010 at 2:28 PM, Mark Voorhies <<a href="mailto:mark.voorhies@ucsf.edu">mark.voorhies@ucsf.edu</a><br>
</div><div><div></div><div class="h5">> <mailto:<a href="mailto:mark.voorhies@ucsf.edu">mark.voorhies@ucsf.edu</a>>> wrote:<br>
> > I tried a first pass at this (branch "pastefig" in my github repository.<br>
> > Latest commit:<br>
> ><br>
> <a href="http://github.com/markvoorhies/ipython/commit/3f3d3d2f6e1f457856ce7e5481aa681fddb72a82" target="_blank">http://github.com/markvoorhies/ipython/commit/3f3d3d2f6e1f457856ce7e5481aa681fddb72a82</a><br>
> > )<br>
><br>
> Thanks!!!<br>
><br>
> > The multi-image bundle is sent as type "multi", with data set to<br>
> > a dict of "format"->data (so, currently,<br>
> > {"png" : PNG data from matplotlib,<br>
> > "svg" : SVG data from maptplotlib}<br>
> > )<br>
> > ["multi" is probably not the best name choice -- any suggestions for<br>
> > something more descriptive/specific?]<br>
><br>
> It may be time to stop for a minute to think about our payloads. The<br>
> payload system works well but we've known all along that once we have<br>
> a clearer understanding of what we need, we'd want to refine its<br>
> design. All along something has been telling me that we should move<br>
> to a full specification of payloads with mimetype information (plus<br>
> possibly ipython-specific extra data). Python has a mimetype library,<br>
> and if our payloads are properly mimetype-encoded, web frontends would<br>
> have little to no extra work to do, as browsers are already tooled up<br>
> to handle gracefully mimetype-tagged data that comes in.<br>
><br>
> What do people think of this approach?<br>
><br>
> > Naively sending PNG data causes reply_socket.send_json(repy_msg)<br>
> > in ipkernel.py to hang (clearing the eighth bit of the data fixes this,<br>
> > does ZMQ require 7bit data?) -- I'm currently working around this by<br>
> > base64 encoding the PNG, but this may not be the best choice wrt<br>
> > bandwidth.<br>
><br>
> That's very odd. Brian, Min, do you know of any such restrictions in<br>
> zmq/pyzmq? I thought that zmq would happily handle pretty much any<br>
> binary data...<br>
><br>
><br>
> Sorry, I sent this a few days ago, but failed to reply-all:<br>
><br>
> It's not zmq, but json that prevents sending raw data. ZMQ can send any bytes<br>
> just fine (I tested with the code being used to deliver the payloads, and it can<br>
> send StringIO from a PNG canvas no problem), but json requires encoded strings.<br>
> Arbitrary C-strings are not necessarily valid JSON strings. This gets<br>
> confusing, but essentially JSON has the same notion of strings as Python-3<br>
> (str=unicode, bytes=C-str). A string for them is a series of /characters/, not<br>
> any series of 8-bit numbers, which is the C/Python<3 notion. Since not all<br>
> series of arbitrary 8-bit numbers can be interpreted as valid characters, JSON<br>
> can't encode them for marshaling. Zeroing out the 8th bit works because all<br>
> 7-bit numbers /are/ valid ASCII characters (and thus also valid in almost all<br>
> encodings).<br>
><br>
> JSON has no binary data format. The only valid data for JSON are: numbers,<br>
> encoded strings, lists, dicts, and lists/dicts of those 4 types, so if you want<br>
> to send binary data, you have to first turn it into an *encoded* string, not a<br>
> C-string. Base64 is an example of such a thing, and I don't know of a better<br>
> way than that, if JSON is enforced. Obviously, if you used pickle instead, there<br>
> would be no problem<br>
><br>
> This is why BSON (the data format used by MongoDB among others) exists. It adds<br>
> binary data support to JSON.<br>
<br>
</div></div>The approach I advocated at SciPy was to use multipart messages. Send the header<br>
encoded in JSON (or whatever) and then follow that with a message part (or<br>
parts) containing the binary data. Don't try to encode the data inside any kind<br>
of markup requiring parsing, whether the format is binary-friendly or not. This<br>
lets the receiver parse just the smallish header and decide what to do with the<br>
largish data without touching the data. You don't want to parse all of a BSON<br>
message just to find out that it's a PNG when you want the SVG.<br>
<div class="im"><br>
--<br>
Robert Kern<br>
<br>
"I have come to believe that the whole world is an enigma, a harmless enigma<br>
that is made terrible by our own mad attempt to interpret it as though it had<br>
an underlying truth."<br>
-- Umberto Eco<br>
<br>
_______________________________________________<br>
</div><div><div></div><div class="h5">IPython-dev mailing list<br>
<a href="mailto:IPython-dev@scipy.org">IPython-dev@scipy.org</a><br>
<a href="http://mail.scipy.org/mailman/listinfo/ipython-dev" target="_blank">http://mail.scipy.org/mailman/listinfo/ipython-dev</a><br>
</div></div></blockquote></div><br></div></div>