[Numpy-discussion] Rewrite np.histogram in c?

Jaime Fernández del Río jaime.frio at gmail.com
Mon Mar 16 14:19:48 EDT 2015

On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer <Jerome.Kieffer at esrf.fr>

> On Mon, 16 Mar 2015 06:56:58 -0700
> Jaime Fernández del Río <jaime.frio at gmail.com> wrote:
> > Dispatching to a different method seems like a no brainer indeed. The
> > question is whether we really need to do this in C.
> I need to do both unweighted & weighted histograms and we got a factor 5
> using (simple) cython:
> it is in the proceedings of Euroscipy, last year.
> http://arxiv.org/pdf/1412.6367.pdf

If I read your paper and code properly, you got 5x faster, mostly because
you combined the weighted and unweighted histograms into a single search of
the array, and because you used an algorithm that can only be applied to
equal- sized bins, similarly to the 10x speed-up Robert was reporting.

I think that having a special path for equal sized bins is a great idea:
let's do it, PRs are always welcome!
Similarly, getting the counts together with the weights seems like a very
good idea.

I also think that writing it in Python is going to take us 80% of the way
there: most of the improvements both of you have reported are not likely to
be coming from the language chosen, but from the algorithm used. And if C
proves to be sufficiently faster to warrant using it, it should be confined
to the number crunching: I don;t think there is any point in rewriting
argument parsing in C.

Also, keep in mind `np.histogram` can now handle arrays of just about
**any** dtype. Handling that complexity in C is not a ride in the park.
Other functions like `np.bincount` and `np.digitize` cheat by only handling
`double` typed arrays, a luxury that histogram probably can't afford at
this point in time.


( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150316/2bcd30a6/attachment.html>

More information about the NumPy-Discussion mailing list