[Numpy-discussion] rebin (corrected)

Mon Aug 30 09:44:06 EDT 2004

At 9:06 AM -0700 2004-08-30, Russell E Owen wrote:
>At 9:14 AM -0400 2004-08-30, Perry Greenfield wrote:
>>...
>>Note that a boxcar smoothing costs no more than doing the above averaging.
>>So in numarray, you could do the following:
>>
>>from numarray.convolve import boxcar
>>b = boxcar(a, (n,n))
>>rebinnedarray = b[::n,::n].copy()
>>
>>or something like this (I haven't tried to figure out the correct offset
>>for the slice) where one wants to rebin by a factor of n in both dimensions.
>>We should probably add a built in function to do this.
>...
>I think the polished version (for nxm binning) is:
>
>from numarray.convolve import boxcar
>b = boxcar(a, (n,m), mode='constant', cval=0)
>rebinnedarray = b[n//2::n,m//2::m].copy()
>
>A rebin function would be handy since using boxcar is a bit tricky.

I made several mistakes, one of them very serious: the convolve 
boxcar cannot do the job unless the array size is an exact multiple 
of the bin factor. The problem is that boxcar starts in the "wrong" 
place. Here's an example:

Problem: solve rebin [0, 1, 2, 3, 4] by 2 to yield: [0.5, 2.5, 4.0]
where the last point (value 4) is averaged with next-off-the-end, 
which we approximate by extending the data
(note: my propsed "polished" version messed that up; Perry had it right).

Using boxcar almost works:

>>>  import numarray as num
>>>  from numarray.convolve import boxcar
>>>  a = num.arange(5)
>>>  a
array([0, 1, 2, 3, 4])
>>>  boxcar(a, (2,))
array([ 0. ,  0.5,  1.5,  2.5,  3.5])
>>>  b = boxcar(a, (2,))
>>>  b
array([ 0. ,  0.5,  1.5,  2.5,  3.5])
>>>  b[1::2]
array([ 0.5,  2.5])

but oops: the last point is missing!!!

What is needed is some way to make the boxcar average start later, so 
it finishes by averaging 4 plus the next value off the edge of the 
array, e.g. a hypothetical version of boxcar with a start argument:

>>>  b2 = nonexistent_boxcar(a, (2,), start=1)
>>>  b2
[0.5, 1.5, 2.5, 3.5, 4.0]
b[0::2]
[0.5, 2.5, 4.0]

nd_image.boxcar_filter has an origin argument that *might* be for 
this purpose. Unfortunately, it is not documented and my attempt to 
use it as desired failed. I have no idea if this is a bug in nd_image 
or a misunderstanding on my part:
>>>  from numarray.nd_image import boxcar_filter
>>>  # first the usual answer; omitting the origin argument yields the 
>>>same answer
>>>  b = boxcar_filter(a, (2,), origin=(0,), output_type=num.Float32)
array([ 0. ,  0.5,  1.5,  2.5,  3.5], type=Float32)
>>>  # now try the origin argument and get a traceback; origin=1 gives 
>>>the same error:
>>>  b = boxcar_filter(a, (2,), origin=(1,), output_type=num.Float32)
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File 
"/usr/local/lib/python2.3/site-packages/numarray/nd_image/filters.py", 
line 339, in boxcar_filter
     output_type = output_type)
   File 
"/usr/local/lib/python2.3/site-packages/numarray/nd_image/filters.py", 
line 280, in boxcar_filter1d
     cval, origin, output_type)
RuntimeError: shift not within filter extent

So, yes, a rebin function that actually worked would be a real godsend!

Meanwhile, any other suggestions? Fortunately in our application we 
*can* call out to IDL, but it seems a shame to have to do that.

-- Russell