Fatest standard way to sum bytes (and their squares)?

Alexander Schmolck a.schmolck at gmail.com
Mon Aug 13 17:49:02 CEST 2007

Erik Max Francis <max at alcyone.com> writes:

> Alexander Schmolck wrote:
>> Is this any faster?
>>  ordSum, orsSumSq = (lambda c:c.real,c.imag)(sum(complex(ord(x),ord(x)<<1)
>> for x in data))
> That's pretty clever, but I neglected to mention that I need to accumulate the
> sums as ints/longs to avoid losing precision, so converting to floating point
> isn't an optional.

I think you can safely use this trick (provided it really makes things faster)
provided you sum blocks no larger than 2**(53-8) bytes; if your files are
really that big you'd certainly want to split summing into several blocks
anyway, because otherwise you'll be doing *lots* of extra bignum arithmetic
instead of int32/int64 addition (I'd assume this will slow things noticably
down even in python). Another trick you could try, again using table-lookup:
work on words (i.e. 2bytes) instead of single bytes again using a table (from
word->(byte-sum,sq-byte-sum) tuples) ; this will half the function calls and
the table size of is hopefully still small enough to not to ruin your
cache-hit rate (you might want to try array).


More information about the Python-list mailing list