[Numpy-discussion] How to efficiently reduce a 1-dim 100-10000 element array with user defined binary function

Sat Nov 15 17:29:21 EST 2008

Charles R Harris <charlesr.harris <at> gmail.com> writes:

> 
> 
> On Fri, Nov 14, 2008 at 6:22 PM, Kim Hansen <slaunger <at> gmail.com> wrote:
> Dear numpy-discussion,
> I am quite new to Python and numpy.
> I am working on a Python application using Scipy, where I need to
> unpack and pack quite large amounts of data (GBs) into data structures
> and convert them into other data structures. Until now the program has
> been running amazingly efficiently (processing 10-125 MB/s depending
> on the task) because I have managed to use only built-in array
> functions for all computationally intensive operations (and beause i
> have fairly good hardware to run it on).
> However, I have run into problems now, as I need to compute new
> checksums for modified data structures using a ones complement add on
> one-dimensional, uint16 arrayrs with 50-10000 elements.
> The best procedure I have come up with yet is the following implementation
> def complement_add_uint16(a, b):
>     """
>     Returns the complement ones addition of two unsigned 16 bit integers
>     The values are added and if the carry is set, the value is incremented.
>     It is assumed that a and b are both in the range [0; 0xFFFF].
>     """
>     c = a + b
>     c += c >> 16
>     return c & 0xFFFF
> def complement_add_checksum(ints):
>     """
>     Return a complement checksum based on a
>     one-dimensional array of dtype=uint16
>     """
>     return reduce(complement_add_uint16, ints, 0)
> 
> 
> Can't you add them up in a higher precision, then do a = (a & 0xffff ) + (a 
>> 16) until the high order bits are gone? The high order bits will count the 
>number ones you need to add. Something like
>def complement_add_uint16(x) :
>    y = sum(x, dtype=uint64)    
>    while y >= 2**16 :
>        y = (y & uint64(0xffff)) + (y >> uint64(16))
>    return y
>You might need to block the data to avoid overflow of uint64, but you get an 
>extra 48 bits in any case. Mind, I'm not sure this is exactly the same thing 
>as what you are trying to do, so it bears checking.

Charles, you are great!

That does seems to work alright! 

It passes my quite elaborate unit tests of the checksum function, where I use 
several real world examples and the performance boost is enourmous!

Whereas my previous method never got me any higher processing speeds than
413 kB/s, I am now at, hold on, 47400 kB/s checksum processing speed which is 
a x 100 performance boost. Once again I see that numpy rules! No need to do 
any C-inlining!

Fantastic! You saved me a lot of trouble there. I had sort of thought of the 
same idea, but somehow convinced myself that it would not work.

-- Slaunger