[IPython-dev] Buffers

Mon Jul 26 21:43:45 EDT 2010

After chatting with Brian a little bit, I think what should happen is the actual buffer gets sent, since zmq itself should not be aware of encoding. The main reason I started looking at handling unicode is that json returns unicode objects instead of strings, so I was encountering errors having never created unicode strings myself, and things like:
send(u'a message') would fail, as would sock.connect(u'tcp://127.0.0.1:123'), and I think that should definitely not happen.

I solved these problems easily enough by changing all the isinstance(s,str) calls to isinstance(s,(str,unicode)). This works because the PyString_As... methods all these tests were screening actually accept unicode as well as str, as long as the unicode object is ascii (or default encoding?).

In implementing buffer support, I can also send (without copying) any object that provides the buffer interface, including arbitrary unicode strings. I think it was a mistake to attempt to conflate these two things and attempt
to reconstruct unicode objects on both sides within zmq.

Here is where my code currently stands:
Either a unicode object a) contains an ascii string, and is sent as a string, or b) it is not a basic string, and its buffer is sent and reconstruction is left up to the user, just like all other buffered objects.

-MinRK

On Jul 26, 2010, at 18:12, Fernando Perez <fperez.net at gmail.com> wrote:

> [ I'm cc'ing the list on this, which may be of general interest ]
> 
> On Mon, Jul 26, 2010 at 2:14 PM, MinRK <benjaminrk at gmail.com> wrote:
>> Basically, the question revolves around what should we do with non-ascii
>> unicode messages in this situation:
>> msg=u'ç'
>> a.send(msg)
>> s = b.recv()
> 
> Shouldn't send/receive *always* work with bytes and never with
> unicode?  Unicode requires knowing the encoding, and that is a
> dangerous proposition on two sides of the wire.
> 
> If a message is unicode, it should be encoded first (to utf-8) and
> decoded on the other side back to unicode.
> 
> There is then the question of the receiving side: should it always
> decode? If not, should a flag about bytes/unicode be sent along?
> 
> Not sure...
> 
> Cheers,
> 
> f