[Tutor] operators >> and &

Sun Feb 14 02:21:49 CET 2010

On Sun, 14 Feb 2010 10:58:10 am Alan Gauld wrote:
> "spir" <denis.spir at free.fr> wrote
>
> > PS: in "l>>24 & 255", the & operation is useless, since all 24
> > higher bits are already thrown away by the shift:
>
> They are not gone however there are still 32 bits in an integer so
> the top bits *should* be set to zero. 

No, Python ints are not 32 bit native ints. They're not even 64 bit 
ints. Python has unified the old "int" type with "long", so that ints 
automatically grow as needed. This is in Python 3.1:

>>> (0).bit_length()
0
>>> (1).bit_length()
1
>>> (2).bit_length()
2
>>> (3).bit_length()
2
>>> (10**100).bit_length()
333

Consequently, if you have an arbitrary int that you don't know where it 
came from, you can't make any assumptions about the number of bits it 
uses.

> But glitches can occur from time to time...

If Python had a glitch of the magnitude of right-shifting non-zero bits 
into a number, that would be not just a bug but a HUGE bug. That would 
be as serious as having 1+1 return 374 instead of 2. Guarding against 
(say) 8 >> 1 returning anything other than 4 makes as much sense as 
guarding against 8//2 returning something other than 4: if you can't 
trust Python to get simple integer arithmetic right, then you can't 
trust it to do *anything*, and your guard (ANDing it with 255) can't be 
trusted either.

> It is good practice to restrict the range to the 8 bits needed by
> and'ing with 255
> even when you think you should be safe.

It is certainly good practice if you are dealing with numbers which 
might be more than 24 bits to start with:

>>> n = 5**25
>>> n >> 24
17763568394
>>> n >> 24 & 255
10

But *if* you know the int is no more than 32 bits, then adding in a 
guard to protect against bugs in the >> operator is just wasting CPU 
cycles and needlessly complicating the code. The right way to guard 
against "this will never happen" scenarios is with assert:

assert n.bit_length() <= 32  # or "assert 0 <= n < 2**32"
print(n >> 24)

This has two additional advantages:

(1) It clearly signals to the reader what your intention is ("I'm 
absolutely 100% sure than n will not be more than 32 bits, but since 
I'm a fallible human, I'd rather find out about an error in my logic as 
soon as possible").

(2) If the caller cares enough about speed to object to the tiny little 
cost of the assertion, he or she can disable it by passing the -O (O 
for Optimise) switch to Python.

(More likely, while each assert is very cheap, a big application might 
have many, many asserts.)

-- 
Steven D'Aprano