[Numpy-discussion] Rebinning numpy array

Johannes Bauer dfnsonfsduifb at gmx.de
Sun Nov 13 11:04:07 EST 2011


Hi group,

I have a rather simple problem, or so it would seem. However I cannot
seem to find the right solution. Here's the problem:

A Geiger counter measures counts in distinct time intervals. The time
intervals are not of constant length. Imaging for example that the
counter would always create a table entry when the counts reach 10. Then
we would have the following bins (made-up data for illustration):

Seconds		Counts	Len	CPS
0 - 44		10	44	0.23
44 - 120	10	76	0.13
120 - 140	10	20	0.5
140 - 200	10	60	0.16

So we have n bins (in this example 4), but they're not equidistant. I
want to rebin samples to make them equidistant. For example, I would
like to rebin into 5 bins of 40 seconds time each. Then the rebinned
example (I calculate by hand so this might contain errors):

0-40		9.09
40-80		5.65
80-120		5.26
120-160		13.33
160-200		6.66

That means, if a destination bin completely overlaps a source bin, its
complete value is taken. If it overlaps partially, linear interpolation
of bin sizes should be used.

It is very important that the overall count amount stays the same (in
this case 40, so my numbers seem to be correct, I checked that). In this
example I increased the bin size, but usually I will want to decrease
bin size (even dramatically).

Now my pathetic attempts look something like this:

interpolation_points = 4000
xpts = [ time.mktime(x.timetuple()) for x in self.getx() ]

interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points)
interpolatedy = numpy.interp(interpolatedx, xpts, self.gety())

self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in
interpolatedx ]
self._yreformatted = interpolatedy

This works somewhat, however I see artifacts depending on the
destination sample size: for example when I have a spike in the sample
input and reduce the number of interpolation points (i.e. increase
destination bin size) slowly, the spike will get smaller and smaller
(expected behaviour). After some amount of increasing, the spike however
will "magically" reappear. I believe this to be an interpolation artifact.

Is there some standard way to get from a non-uniformally distributed bin
distribution to a unifomally distributed bin distribution of arbitrary
bin width?

Best regards,
Joe



More information about the NumPy-Discussion mailing list