[Tutor] operators >> and &

Sun Feb 14 10:16:18 CET 2010

"Steven D'Aprano" <steve at pearwood.info> wrote 

>> They are not gone however there are still 32 bits in an integer so
>> the top bits *should* be set to zero. 
> 
> No, Python ints are not 32 bit native ints. They're not even 64 bit 
> ints. Python has unified the old "int" type with "long", so that ints 
> automatically grow as needed. This is in Python 3.1:

Valid point but irrelevant to the one I was making which 
is that the number after shifting is longer than 8 bits.

>> But glitches can occur from time to time...
> 
> If Python had a glitch of the magnitude of right-shifting non-zero bits 
> into a number, that would be not just a bug but a HUGE bug. 

Bit shifting is machine specific. Some CPUs (the DEC PDP 
range from memory is an example) will add the carry bit for 
example, most will not. But you can never be sure unless you 
know exactly which artchiotecture the program will run on.
And of course data can always be corrupted at any time so its
always wise to take as many precautions as possibe to keep 
it clean (although corruption within the CPU itself is, I agree, 
extremely  unlikely)

> be as serious as having 1+1 return 374 instead of 2. Guarding against 
> (say) 8 >> 1 returning anything other than 4 

Not if you have a 4 bit processor and the previous opertation 
set the carry flag. In that case returning 12 would be emminently 
sensible....and used to be a common assembler trick for 
recovering from overflow errors.

> guarding against 8//2 returning something other than 4: if you can't 
> trust Python to get simple integer arithmetic right, 

But ths is not simple integer arithmetic it is bit m,anippulation. 
You can use bit manipulation to fake arithmetic but they are 
fundamentally different operations and may not always 
produce the same results depending on how the designer 
built it!

> trust it to do *anything*, and your guard (ANDing it with 255) can't be 
> trusted either.

Nothing can be trusted 100% on a computer because, as you 
say the guard might itself be corrupted. Itas all about risk management.
But when it comes to bit operations I'd always have at least one 
extra level of check, whether it be a mask or a checksum.

> It is certainly good practice if you are dealing with numbers which 
> might be more than 24 bits to start with:

Its more than good practice there, its essential.

> But *if* you know the int is no more than 32 bits, then adding in a 
> guard to protect against bugs in the >> operator is just wasting CPU 

It may not be a bug it may be a design feature.
Now all modern CPUs behave as you would expect but if 
you are running on older equipment (or specialised 
hardware - but that's more unlikely to have Python onboard!) 
you can never be quite sure how bitwise operations will react 
at boundary cases. If you know for certainty what the runtime 
environment will be then you can afford to take a chance.

In the case in point the & 255 keeps the coding style consistent 
and provides an extra measure of protection against unexpected 
oddities so I would keep it in there.

> cycles and needlessly complicating the code. The right way to guard 
> against "this will never happen" scenarios is with assert:
> 
> assert n.bit_length() <= 32  # or "assert 0 <= n < 2**32"

I would accept the second condition but the mask is much faster.
bit_length doesn't seem to work on any of my Pythons (2.5,2.6 and 3.1)

> This has two additional advantages:
> 
> (1) It clearly signals to the reader what your intention is ("I'm 
> absolutely 100% sure than n will not be more than 32 bits, but since 
> I'm a fallible human, I'd rather find out about an error in my logic as 
> soon as possible").

The assert approach is perfectly valid, but since the mask is 
more consistent I'd still prefer to use it in this case.

Alan G.