[Python-3000] [Python-Dev] Byte literals (was Re: [Python-checkins] Changing string constants to byte arrays ( r55119 - in python/branches/py3k-struni/Lib: codecs.py test/test_codecs.py ))
Guido van Rossum
guido at python.org
Mon May 7 19:42:41 CEST 2007
[+python-3000; replies please remove python-dev]
On 5/5/07, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Fred L. Drake, Jr." <fdrake at acm.org> wrote:
> >
> > On Saturday 05 May 2007, Aahz wrote:
> > > I'm with MAL and Fred on making literals immutable -- that's safe and
> > > lots of newbies will need to use byte literals early in their Python
> > > experience if they pick up Python to operate on network data.
> >
> > Yes; there are lots of places where bytes literals will be used the way str
> > literals are today. buffer(b'...') might be good enough, but it seems more
> > than a little idiomatic, and doesn't seem particularly readable.
> >
> > I'm not suggesting that /all/ literals result in constants, but bytes literals
> > seem like a case where what's wanted is the value. If b'...' results in a
> > new object on every reference, that's a lot of overhead for a network
> > protocol implementation, where the data is just going to be written to a
> > socket or concatenated with other data. An immutable bytes type would be
> > very useful as a dictionary key as well, and more space-efficient than
> > tuple(b'...').
>
> I was saying the exact same thing last summer. See my discussion with
> Martin about parsing/unmarshaling. What I expect will happen with bytes
> as dictionary keys is that people will end up subclassing dictionaries
> (with varying amounts of success and correctness) to do something like
> the following...
>
> class bytesKeys(dict):
> ...
> def __setitem__(self, key, value):
> if isinstance(key, bytes):
> key = key.decode('latin-1')
> else:
> raise KeyError("only bytes can be used as keys")
> dict.__setitem__(self, key, value)
> ...
>
> Is it optimal? No. Would it be nice to have immtable bytes? Yes. Do
> I think it will really be a problem in parsing/unmarshaling? I don't
> know, but the fact that there now exists a reasonable literal syntax b'...'
> rather than the previous bytes([1, 2, 3, ...]) means that we are coming
> much closer to having what really is about the best way to handle this;
> Python 2.x str.
I don't know how this will work out yet. I'm not convinced that having
both mutable and immutable bytes is the right thing to do; but I'm
also not convinced of the opposite. I am slowly working on the
string/unicode unification, and so far, unfortunately, it is quite
daunting to get rid of 8-bit strings even at the Python level let
alone at the C level.
I suggest that the following exercise, to be carried out in the
py3k-struni branch, might be helpful: (1) change the socket module to
return bytes instead of strings (it already takes bytes, by virtue of
the buffer protocol); (2) change its makefile() method so that it uses
the new io.py library, in particular the SocketIO wrapper there; (3)
fix up the httplib module and perhaps other similar ones. Take copious
notes while doing this. Anyone up for this? I will listen! (I'd do it
myself but I don't know where I'd find the time).
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000
mailing list