unsigned 32 bit arithmetic type?

Wed Oct 25 08:05:56 EDT 2006

Martin v. Löwis wrote:
> Robin Becker schrieb:
>> Hi, just trying to avoid wheel reinvention. I have need of an unsigned
>> 32 bit arithmetic type to carry out a checksum operation and wondered if
>> anyone had already defined such a beast.
>>
>> Our current code works with 32 bit cpu's, but is failing with 64 bit
>> comparisons; it's clearly wrong as we are comparing a number with a
>> negated number; the bits might drop off in 32 bits, but not in 64.
> 
> Not sure what operations you are doing: In Python, bits never drop off
> (at least not in recent versions).
> 
> If you need to drop bits, you need to do so explicitly, by using the
> bit mask operations. I could tell you more if you'd tell us what
> the specific operations are.

This code is in a contribution to the reportlab toolkit that handles TTF fonts.
The fonts contain checksums computed using 32bit arithmetic. The original 
Cdefintion is as follows

> ULONG CalcTableChecksum(ULONG *Table, ULONG Length)
> {
> ULONG Sum = 0L;
> ULONG *Endptr = Table+((Length+3) & ~3) / sizeof(ULONG);
> 
> while (Table < EndPtr)
> 	Sum += *Table++;
> return Sum;
> }

so effectively we're doing only additions and letting bits roll off the end.

Of course the actual semantics is dependent on what C unsigned arithmetic does 
so we're relying on that being the same everywhere.

This algorithm was pretty simple in Python until 2.3 when shifts over the end of 
ints started going wrong. For some reason we didn't do the obvious and just do 
everything in longs and just mask off the upper bits. For some reason (probably 
my fault) we seem to have accumulated code like

def _L2U32(L):
     '''convert a long to u32'''
     return unpack('l',pack('L',L))[0]

if sys.hexversion>=0x02030000:
         def add32(x, y):
             "Calculate (x + y) modulo 2**32"
             return _L2U32((long(x)+y) & 0xffffffffL)
else:
         def add32(x, y):
             "Calculate (x + y) modulo 2**32"
             lo = (x & 0xFFFF) + (y & 0xFFFF)
             hi = (x >> 16) + (y >> 16) + (lo >> 16)
             return (hi << 16) | (lo & 0xFFFF)

def calcChecksum(data):
         """Calculates TTF-style checksums"""
         if len(data)&3: data = data + (4-(len(data)&3))*"\0"
         sum = 0
         for n in unpack(">%dl" % (len(data)>>2), data):
             sum = add32(sum,n)
         return sum

and also silly stuff like

def testAdd32(self):
     "Test add32"
     self.assertEquals(add32(10, -6), 4)
     self.assertEquals(add32(6, -10), -4)
     self.assertEquals(add32(_L2U32(0x80000000L), -1), 0x7FFFFFFF)
     self.assertEquals(add32(0x7FFFFFFF, 1), _L2U32(0x80000000L))

def testChecksum(self):
     "Test calcChecksum function"
     self.assertEquals(calcChecksum(""), 0)
     self.assertEquals(calcChecksum("\1"), 0x01000000)
     self.assertEquals(calcChecksum("\x01\x02\x03\x04\x10\x20\x30\x40"), 0x11223344)
     self.assertEquals(calcChecksum("\x81"), _L2U32(0x81000000L))
   _L2U32(0x80000000L))

where while it might be reasonable to do testing it seems the tests aren't very 
sensible eg what is -6 doing in a u32 test? This stuff just about works on a 32 
bit machine, but is failing miserably on a 64bit AMD. As far as I can see I just 
need to use masked longs throughout.

In a C extension I can still do the computation exfficiently on a 32bit machine, 
but I need to do masking for a 64 bit machine.
-- 
Robin Becker