[Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py ))

Guido van Rossum guido at python.org
Mon May 7 19:42:41 CEST 2007


[+python-3000; replies please remove python-dev]

On 5/5/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Fred L. Drake, Jr." <fdrake at acm.org> wrote:
> >
> > On Saturday 05 May 2007, Aahz wrote:
> >  > I'm with MAL and Fred on making literals immutable -- that's safe and
> >  > lots of newbies will need to use byte literals early in their Python
> >  > experience if they pick up Python to operate on network data.
> >
> > Yes; there are lots of places where bytes literals will be used the way str
> > literals are today.  buffer(b'...') might be good enough, but it seems more
> > than a little idiomatic, and doesn't seem particularly readable.
> >
> > I'm not suggesting that /all/ literals result in constants, but bytes literals
> > seem like a case where what's wanted is the value.  If b'...' results in a
> > new object on every reference, that's a lot of overhead for a network
> > protocol implementation, where the data is just going to be written to a
> > socket or concatenated with other data.  An immutable bytes type would be
> > very useful as a dictionary key as well, and more space-efficient than
> > tuple(b'...').
>
> I was saying the exact same thing last summer.  See my discussion with
> Martin about parsing/unmarshaling.  What I expect will happen with bytes
> as dictionary keys is that people will end up subclassing dictionaries
> (with varying amounts of success and correctness) to do something like
> the following...
>
>     class bytesKeys(dict):
>         ...
>         def __setitem__(self, key, value):
>             if isinstance(key, bytes):
>                 key = key.decode('latin-1')
>             else:
>                 raise KeyError("only bytes can be used as keys")
>             dict.__setitem__(self, key, value)
>         ...
>
> Is it optimal?  No.  Would it be nice to have immtable bytes?  Yes.  Do
> I think it will really be a problem in parsing/unmarshaling?  I don't
> know, but the fact that there now exists a reasonable literal syntax b'...'
> rather than the previous bytes([1, 2, 3, ...]) means that we are coming
> much closer to having what really is about the best way to handle this;
> Python 2.x str.

I don't know how this will work out yet. I'm not convinced that having
both mutable and immutable bytes is the right thing to do; but I'm
also not convinced of the opposite. I am slowly working on the
string/unicode unification, and so far, unfortunately, it is quite
daunting to get rid of 8-bit strings even at the Python level let
alone at the C level.

I suggest that the following exercise, to be carried out in the
py3k-struni branch, might be helpful: (1) change the socket module to
return bytes instead of strings (it already takes bytes, by virtue of
the buffer protocol); (2) change its makefile() method so that it uses
the new io.py library, in particular the SocketIO wrapper there; (3)
fix up the httplib module and perhaps other similar ones. Take copious
notes while doing this. Anyone up for this? I will listen! (I'd do it
myself but I don't know where I'd find the time).

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list