[Numpy-discussion] Fast histogram

Charles R Harris charlesr.harris at gmail.com
Thu Apr 17 12:18:44 EDT 2008


On Thu, Apr 17, 2008 at 10:02 AM, Zachary Pincus <zachary.pincus at yale.edu>
wrote:

> Hi folks,
>
> I'm working on a live-video display for some microscope control tools
> I'm building. For this, I need a  fast histogram function to work on
> large-ish images (1000x2000 or so) at video rate, with cycles left
> over for more interesting calculations (like autofocus).
>
> Now, numpy.histogram is a bit slower than I'd like, probably because
> it's pretty general (and of course cf. the recent discussion about its
> speed). I just need even bins within a set range. This is easy enough
> to do with a C-extension, or perhaps even cython, but before I go
> there, I was wondering if there's a numpy function that can help.
>
> Here's what I have in mind:
>
> def histogram(arr, bins, range):
>   min, max = range
>   indices = numpy.clip(((arr.astype(float) - min) * bins / (max -
> min)).astype(int), 0, bins-1)
>   histogram = numpy.zeros(bins, numpy.uint32)
>   for i in indices:
>     histogram[i] += 1
>
> Now, clearly, the last loop is what needs speeding up. Are there any
> numpy functions that can do this kind of operation? Also perhaps
> unnecessarily slow is the conversion of 'arr' to a float -- I do this
> to avoid overflow issues with integer arrays.
>

How about a combination of sort, followed by searchsorted right/left using
the bin boundaries as keys? The difference of the two resulting vectors is
the bin value. Something like:

In [1]: data = arange(100)

In [2]: bins = [0,10,50,70,100]

In [3]: lind = data.searchsorted(bins)

In [4]: print lind[1:] - lind[:-1]
[10 40 20 30]

This won't be as fast as a c implementation, but at least avoids the loop.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080417/b38d2d4b/attachment.html>


More information about the NumPy-Discussion mailing list