[IPython-dev] Buffers

Fernando Perez fperez.net at gmail.com
Tue Jul 27 14:34:17 EDT 2010


On Tue, Jul 27, 2010 at 11:14 AM, Brian Granger <ellisonbg at gmail.com> wrote:
>
> Yes, I hadn't though about the fact that unicode objects are buffers as
> well.  But, we could raise a TypeError when a user tries to send a unicode
> object (str in python 3).  IOW, don't treat unicode as buffers and force
> them to encode/de ode.  Does this make sense or should we allow unicode to
> be sent as buffers.

Well, the problem I explained about a possible mismatch in internal
unicode storage format rears its ugly head if we allow
unicode-as-buffer.  I was precisely worried about sending 3.x strings
as buffers, since the two ends may not agree on what the buffer means.
 I may be worrying about a non-problem, but at some point it might be
worth veryfing this.  The test is a bit cumbersome to set up, because
you have to build two versions of Python, one with ucs-2 and one with
ucs-4, and see what happens if they try to send each other stuff.  But
I think it's a test worth making, so we know for sure whether this is
a problem or not, as it will dictate design decisions for 3.x on all
string handling.

If it is a problem, then there are some options:

- disallow communication between ucs 2/4 pythons.
- detect a mismatch and encode/decode all unicode strings to utf-8 on
send/receive, but allow raw buffer sending if there's no mismatch.
- *always* encode/decode.

The middle option seems appealing because it avoids the overhead of
encoding/decoding on all sends, but I'm worried it may be too brittle.

Cheers,


f



More information about the IPython-dev mailing list