
It might make sense to dispatch to difference c implements if the bins are equally spaced (as created by using an integer for the np.histogram bins argument), vs. non-equally-spaced bins. In that case, getting the bigger speedup may be easier, at least for one common use case. -Robert On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:
Hi,
Numpy.histogram is implemented in python, and is a little sluggish. This has been discussed previously on the mailing list, [1, 2]. It came up in a project that I maintain, where a new feature is bottlenecked by numpy.histogram, and one developer suggested a faster implementation in cython [3].
Would it make sense to reimplement this function in c? or cython? Is moving functions like this from python to c to improve performance within the scope of the development roadmap for numpy? I started implementing this a little bit in c, [4] but I figured I should check in here first.
Where do you think the performance gains will come from? The PR in your project that claims a 10x speed-up uses a method that is only fit for equally spaced bins. I want to think that implementing that exact same algorithm in Python with NumPy would be comparably fast, say within 2x.
For the general case, NumPy is already doing most of the heavy lifting (the sorting and the searching) in C: simply replicating the same algorithmic approach entirely in C is unlikely to provide any major speed-up. And if the change is to the algorithm, then we should first try it out in Python.
That said, if you can speed things up 10x, I don't think there is going to be much opposition to moving it to C!
Jaime
-- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion