[Python-ideas] bitwise operations on bytes

Sat Aug 8 23:31:32 CEST 2009

On Fri, Aug 7, 2009 at 10:54 PM, Steven D'Aprano<steve at pearwood.info> wrote:
> On Fri, 7 Aug 2009 11:19:20 pm Mark Dickinson wrote:
>
>> > Also conceivable is using the shift operators >> and << on bytes,
>> > but I personally would use that less often, and the result of such
>> > an operation is ambiguous due to endianness.
>>
>> Agreed.  To make sense of the shift operators you effectively have
>> to give 'position' interpretations for the individual bits, and
>> there's no single obvious way of doing this;  for the plain bitwise
>> operations this isn't necessary.
>
> To me, the single obvious meaning of left- and right-shift is to shift
> to the left and the right :)
>
> E.g.
>
> b"abcd" >> 8
> => "abc"
>
> b"abcd" << 8
> => "abcd\0"
>
> which would have the benefit of matching what ints already do:
>
>>>> [hex(ord(c)) for c in "ABCD"]
> ['0x41', '0x42', '0x43', '0x44']
>>>> n = 0x41424344
>>>> hex(n >> 8)
> '0x414243'
>>>> hex(n << 8)
> '0x4142434400'
>
>
> I'm not sure what other "obvious" meanings you could give them. Have I
> missed something?

Yes, byte order. It is not at all "obvious" whether the lowest-order
byte is on the left or on the right. Your interpretation is
big-endian. But mathematically speaking, a little-endian
interpretation is somewhat easier, because the value in byte number i
corresponds to that value multiplied by 256**i. Another way to look at
is, is b'abc' supposed to be equal to b'\0abc' (big-endian) or
b'abc\0' (little-endian) ? I find ignoring trailing nulls more logical
than ignoring leading nulls, since the indexes of the significant
digits are the same in the little-endian case.

In the grander scheme of things, I worry that interpreting byte
strings as integers and implementing bitwise operators on them is
going to cause more confusion and isn't generally useful enough to
warrant the extra code. I'd be okay with a standard API to transform a
byte array into an integer and vice versa -- there you can be explicit
about byte order and what to do about negative numbers. I can't
remember right now if we already have such an API for arbitrary sizes
-- the struct module only handles sizes 2, 4 and 8. I can hack it by
going via a hex representation:

  i = 10**100
  b = bytes.fromhex(hex(i)[2:])
  import binascii
  j = int(binascii.hexlify(b), 16)
  assert j == i

but this is a pretty gross hack. Still, most likely faster than
writing out a loop in Python.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)