[Python-3000] Immutable bytes -- looking for volunteer

Jim Jewett jimjjewett at gmail.com
Wed Sep 26 00:14:19 CEST 2007


> How about we take the existing PyString implementation (Python 2's
> str, currently still present as str8 in py3k), remove the locale and
> unicode mixing support, and call it bytes.

Is that just encode/decode?
But isn't this one sensible way to store an encoded str, so that
decode (only) would still make sense?

I would have expected to drop text or character-oriented methods,
because they should really be done on the (decoded) unicode version.
Given bytes use in wire protocols, I could also understand saying that
these methods only work on ASCII, and either raise an exception or
return false for other byte values.

text-or-chararacter-oriented methods:

'capitalize', 'center',  'endswith', 'expandtabs', 'isalnum',
'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper',
'ljust', 'lower', 'lstrip', 'rjust', 'rstrip',  'splitlines', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill'

> It would mean more fixes beyond what Jeffrey and Adam did, since
> iterating over a bytes instance would return a bytes instance of
> length 1 instead of a small int,

makes sense

> and the bytes constructor would
> change accordingly (no more initializing a bytes object from a list of
> ints).

Why not?

I expect the literal b"ASCII string" to be the most common
constructor, but I don't see the problem with a sequence of ints (or
hex) as an alternative constructor.

> The (new) buffer object would also have to change to be more
> compatible with the (new) bytes object -- bytes<-->buffer conversions
> should be 1-1, and iterating over a buffer instance would also have to
> return a length-1 buffer (or bytes???) instance.

I would return a bytes instance.  If you return a 1-char buffer, and
someone does modify that, it isn't clear whether the change should be
reflected in the original source buffer.  If someone does want an
in-place filter, they can always use enumerate and slicing.


Can we assume that the two types are unequal, but that you can search
a buffer for a (constant) bytes?

    >>> mybytes = b"some data"
    >>> mybuffer = buffer(mybytes)

    >>> mybuffer == mybytes
    False

    >>> mybuffer.startswith(mybytes)  and \
    ...    mybuffer.endswith(mybytes)  and \
    ...    len(mybuffer) == len(mybytes)
    True

-jJ


More information about the Python-3000 mailing list