# [Numpy-discussion] Rebinning numpy array

Robert Kern robert.kern at gmail.com
Sun Nov 13 12:27:18 EST 2011

```On Sun, Nov 13, 2011 at 16:04, Johannes Bauer <dfnsonfsduifb at gmx.de> wrote:
> Hi group,
>
> I have a rather simple problem, or so it would seem. However I cannot
> seem to find the right solution. Here's the problem:
>
> A Geiger counter measures counts in distinct time intervals. The time
> intervals are not of constant length. Imaging for example that the
> counter would always create a table entry when the counts reach 10. Then
> we would have the following bins (made-up data for illustration):
>
> Seconds         Counts  Len     CPS
> 0 - 44          10      44      0.23
> 44 - 120        10      76      0.13
> 120 - 140       10      20      0.5
> 140 - 200       10      60      0.16
>
> So we have n bins (in this example 4), but they're not equidistant. I
> want to rebin samples to make them equidistant. For example, I would
> like to rebin into 5 bins of 40 seconds time each. Then the rebinned
> example (I calculate by hand so this might contain errors):
>
> 0-40            9.09
> 40-80           5.65
> 80-120          5.26
> 120-160         13.33
> 160-200         6.66
>
> That means, if a destination bin completely overlaps a source bin, its
> complete value is taken. If it overlaps partially, linear interpolation
> of bin sizes should be used.

What you want to do is set up a linear interpolation based on the
boundaries of the uneven bins.

Seconds  Value
0        0
44       10
120      20
140      30
200      40

Then evaluate that linear interpolation on the boundaries of the uniform bins.

[~]
|18> bin_bounds = np.array([0.0, 44.0, 120, 140, 200])

[~]
|19> bin_values = np.array([0.0, 10, 10, 10, 10])

[~]
|20> cum_bin_values = bin_values.cumsum()

[~]
|21> new_bounds = np.array([0.0, 40, 80, 120, 160, 200])

[~]
|22> ecdf = np.interp(new_bounds, bin_bounds, cum_bin_values)

[~]
|23> ecdf
array([  0.        ,   9.09090909,  14.73684211,  20.        ,
33.33333333,  40.        ])

[~]
|24> uniform_histogram = np.diff(ecdf)

[~]
|25> uniform_histogram
array([  9.09090909,   5.64593301,   5.26315789,  13.33333333,   6.66666667])

This may be what you are doing already. I'm not sure what is in your
getx() and gety() methods. If so, then I think you are on the right
track. If you still have problems, then we might need to see some of
the problematic data and results.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
-- Umberto Eco

```