[Numpy-discussion] rebin (corrected)

Russell E Owen rowen at u.washington.edu
Mon Aug 30 11:28:16 EDT 2004

At 10:56 AM -0700 2004-08-30, Tim Hochberg wrote:
>>But I still agree with Perry that we ought to provide a built-in rebin
>>function.  It is particularly useful for large multi-dimensional arrays
>>where it is wasteful (in both CPU and memory) to create a full-size
>>copy of the array before resampling it down to the desired rebinned
>>size.  I appended the .copy() so that at least the big array is not
>>still hanging around in memory (remember that the slice creates a
>>view rather than a copy.)
>>				Rick
>A reasonable facsimile of this should  be doable without dropping 
>into C. Something like:
>def rebin_sum(a, (m, n)):
>    M, N = a.shape
>    a = na.reshape(a, (M/m,m,N/n,n))
>    return na.sum(na.sum(a, 3), 1) / float(m*n)
>This does create some temps, but they're smaller than in the boxcar 
>case and it doesn't do all the extra calculation. This doesn't 
>handle the case where a.shape isn't an exact multiple of (m,n). 
>However, I don't think that would be all that hard to implement, if 
>there is a consensus on what should happen then.
>    I can think of at least two different ways this might be done: 
>tacking on values that match the last value as already proposed and 
>tacking on zeros. There may be others as well. It should probably 
>get a boundary condition argument like convolve and friends.
>    Personally, I'd be find rebin a little suprising if it resulted 
>in an average, as all the implementations thus far have done, rather 
>than a simple sum over the stencil.  When I think of rebinning I'm 
>thinking of number of occurences per bin, and rebinning should keep 
>the totals occurences the same, not change them by the inverse of 
>the stencil size.
>My 2.3 cents anyway

I agree that it would be nice to avoid the extra calculation involved 
in convolution or boxcar averaging, and the extra temp storage.

Your algorithm certainly looks promising, but I'm not sure there's 
any space saving when the array shape is not an exact multiple of the 
bin factor. Duplicating the last value is probably the most 
reasonable alternative for my own applications (imaging). To use your 
algorithm, I guess one has to increase the array first, creating a 
new temporary array that is the same as the original except expanded 
to an even mutiple of the bin factor. In theory one could avoid 
duplication, but I suspect to do this efficiently one really needs to 
use C code.

I personally have no strong opinion on averaging vs summing. Summing 
retains precision but risks overflow. Averaging potentially has the 
opposite advantages, though avoiding overflow is tricky. Note that 
Nadav Horesh's suggested solution (convolution with a mask of 1s 
instead of boxcar averaging) computed the sum.

-- Russell

More information about the NumPy-Discussion mailing list