
On Fri, Aug 7, 2009 at 10:54 PM, Steven D'Aprano<steve@pearwood.info> wrote:
On Fri, 7 Aug 2009 11:19:20 pm Mark Dickinson wrote:
Also conceivable is using the shift operators >> and << on bytes, but I personally would use that less often, and the result of such an operation is ambiguous due to endianness.
Agreed. To make sense of the shift operators you effectively have to give 'position' interpretations for the individual bits, and there's no single obvious way of doing this; for the plain bitwise operations this isn't necessary.
To me, the single obvious meaning of left- and right-shift is to shift to the left and the right :)
E.g.
b"abcd" >> 8 => "abc"
b"abcd" << 8 => "abcd\0"
which would have the benefit of matching what ints already do:
[hex(ord(c)) for c in "ABCD"] ['0x41', '0x42', '0x43', '0x44'] n = 0x41424344 hex(n >> 8) '0x414243' hex(n << 8) '0x4142434400'
I'm not sure what other "obvious" meanings you could give them. Have I missed something?
Yes, byte order. It is not at all "obvious" whether the lowest-order byte is on the left or on the right. Your interpretation is big-endian. But mathematically speaking, a little-endian interpretation is somewhat easier, because the value in byte number i corresponds to that value multiplied by 256**i. Another way to look at is, is b'abc' supposed to be equal to b'\0abc' (big-endian) or b'abc\0' (little-endian) ? I find ignoring trailing nulls more logical than ignoring leading nulls, since the indexes of the significant digits are the same in the little-endian case. In the grander scheme of things, I worry that interpreting byte strings as integers and implementing bitwise operators on them is going to cause more confusion and isn't generally useful enough to warrant the extra code. I'd be okay with a standard API to transform a byte array into an integer and vice versa -- there you can be explicit about byte order and what to do about negative numbers. I can't remember right now if we already have such an API for arbitrary sizes -- the struct module only handles sizes 2, 4 and 8. I can hack it by going via a hex representation: i = 10**100 b = bytes.fromhex(hex(i)[2:]) import binascii j = int(binascii.hexlify(b), 16) assert j == i but this is a pretty gross hack. Still, most likely faster than writing out a loop in Python. -- --Guido van Rossum (home page: http://www.python.org/~guido/)