bytearray inconsistencies?

Ned Batchelder ned at
Sat Dec 21 02:58:33 CET 2013

On 12/20/13 8:06 PM, Mark Lawrence wrote:
> Quoting from
> "The bytearray type is a mutable sequence of integers in the range 0 <=
> x < 256."
> Quoting from
> "Whenever a bytes or bytearray method needs to interpret the bytes as
> characters (e.g. the is...() methods, split(), strip()), the ASCII
> character set is assumed (text strings use Unicode semantics).
> Note - Using these ASCII based methods to manipulate binary data that is
> not stored in an ASCII based format may lead to data corruption.
> The search operations (in, count(), find(), index(), rfind() and
> rindex()) all accept both integers in the range 0 to 255 (inclusive) as
> well as bytes and byte array sequences.
> Changed in version 3.3: All of the search methods also accept an integer
> in the range 0 to 255 (inclusive) as their first argument."
> I don't understand why the docs talk about "a mutable sequence of
> integers" but then discuss "needs to interpret the bytes as characters".

The split and strip methods work with whitespace when given no 
arguments.  Bytes aren't whitespace.  Characters can be, so the bytes 
need to be interpreted as characters.  Likewise, the is* methods 
(isalnum, isalpha, isdigit, islower, isspace, istitle, isupper) all 
require characters, so the bytes must be interpreted.

>   Further I don't understand why the changes done in 3.3 referred to
> above haven't also been applied to (say) the split method.  If I can
> call find to look for a zero, why can't I split on it?

I don't know the reason, but I would guess either no one considered it, 
or it was deemed unlikely to be useful.

If you have a zero, you can split on it with: 
bytestring.split(bytes([0])), but that doesn't explain why find can take 
a simple zero, and split has to take a bytestring with a zero in it.

Ned Batchelder,

More information about the Python-list mailing list