[Numpy-discussion] Rewrite np.histogram in c?

Jaime Fernández del Río jaime.frio at gmail.com
Mon Mar 16 02:00:33 EDT 2015


On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo at gmail.com> wrote:

> Hi,
>
> Numpy.histogram is implemented in python, and is a little sluggish. This
> has been discussed previously on the mailing list, [1, 2]. It came up in a
> project that I maintain, where a new feature is bottlenecked by
> numpy.histogram, and one developer suggested a faster implementation in
> cython [3].
>
> Would it make sense to reimplement this function in c? or cython? Is
> moving functions like this from python to c to improve performance within
> the scope of the development roadmap for numpy? I started implementing this
> a little bit in c, [4] but I figured I should check in here first.
>

Where do you think the performance gains will come from? The PR in your
project that claims a 10x speed-up uses a method that is only fit for
equally spaced bins. I want to think that implementing that exact same
algorithm in Python with NumPy would be comparably fast, say within 2x.

For the general case, NumPy is already doing most of the heavy lifting (the
sorting and the searching) in C: simply replicating the same algorithmic
approach entirely in C is unlikely to provide any major speed-up. And if
the change is to the algorithm, then we should first try it out in Python.

That said, if you can speed things up 10x, I don't think there is going to
be much opposition to moving it to C!

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150315/ab2c26a9/attachment.html>


More information about the NumPy-Discussion mailing list