[Numpy-discussion] Rebinning numpy array

Olivier Delalleau shish at keba.be
Sun Nov 13 12:48:10 EST 2011


Also: it seems like you are using values at the boundaries of the bins,
while I think it would make more sense to compute interpolated values at
the middle point of a bin. I'm not sure it'll make a big difference
visually, but it may be more appropriate.

-=- Olivier

2011/11/13 Olivier Delalleau <shish at keba.be>

> Just one thing: numpy.interp says it doesn't check that the x coordinates
> are increasing, so make sure it's the case.
>
> Assuming this is ok, I could still see how you may get some non-smooth
> behavior: this may be because your spike can either be split between two
> bins (which "dilutes" it somehow), or be included in a single bin (which
> would make it stand out more). And as you increase your bin size, you will
> switch between these two situations.
>
> -=- Olivier
>
>
> 2011/11/13 Johannes Bauer <dfnsonfsduifb at gmx.de>
>
>> Hi group,
>>
>> I have a rather simple problem, or so it would seem. However I cannot
>> seem to find the right solution. Here's the problem:
>>
>> A Geiger counter measures counts in distinct time intervals. The time
>> intervals are not of constant length. Imaging for example that the
>> counter would always create a table entry when the counts reach 10. Then
>> we would have the following bins (made-up data for illustration):
>>
>> Seconds         Counts  Len     CPS
>> 0 - 44          10      44      0.23
>> 44 - 120        10      76      0.13
>> 120 - 140       10      20      0.5
>> 140 - 200       10      60      0.16
>>
>> So we have n bins (in this example 4), but they're not equidistant. I
>> want to rebin samples to make them equidistant. For example, I would
>> like to rebin into 5 bins of 40 seconds time each. Then the rebinned
>> example (I calculate by hand so this might contain errors):
>>
>> 0-40            9.09
>> 40-80           5.65
>> 80-120          5.26
>> 120-160         13.33
>> 160-200         6.66
>>
>> That means, if a destination bin completely overlaps a source bin, its
>> complete value is taken. If it overlaps partially, linear interpolation
>> of bin sizes should be used.
>>
>> It is very important that the overall count amount stays the same (in
>> this case 40, so my numbers seem to be correct, I checked that). In this
>> example I increased the bin size, but usually I will want to decrease
>> bin size (even dramatically).
>>
>> Now my pathetic attempts look something like this:
>>
>> interpolation_points = 4000
>> xpts = [ time.mktime(x.timetuple()) for x in self.getx() ]
>>
>> interpolatedx = numpy.linspace(xpts[0], xpts[-1], interpolation_points)
>> interpolatedy = numpy.interp(interpolatedx, xpts, self.gety())
>>
>> self._xreformatted = [ datetime.datetime.fromtimestamp(x) for x in
>> interpolatedx ]
>> self._yreformatted = interpolatedy
>>
>> This works somewhat, however I see artifacts depending on the
>> destination sample size: for example when I have a spike in the sample
>> input and reduce the number of interpolation points (i.e. increase
>> destination bin size) slowly, the spike will get smaller and smaller
>> (expected behaviour). After some amount of increasing, the spike however
>> will "magically" reappear. I believe this to be an interpolation artifact.
>>
>> Is there some standard way to get from a non-uniformally distributed bin
>> distribution to a unifomally distributed bin distribution of arbitrary
>> bin width?
>>
>> Best regards,
>> Joe
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20111113/26bcd9f5/attachment.html>


More information about the NumPy-Discussion mailing list