[Numpy-discussion] Rewrite np.histogram in c?

Robert McGibbon rmcgibbo at gmail.com
Mon Mar 16 14:35:45 EDT 2015


Hi,

It sounds like putting together a PR makes sense then. I'll try hacking on
this a bit.

-Robert
On Mar 16, 2015 11:20 AM, "Jaime Fernández del Río" <jaime.frio at gmail.com>
wrote:

> On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer <Jerome.Kieffer at esrf.fr>
> wrote:
>
>> On Mon, 16 Mar 2015 06:56:58 -0700
>> Jaime Fernández del Río <jaime.frio at gmail.com> wrote:
>>
>> > Dispatching to a different method seems like a no brainer indeed. The
>> > question is whether we really need to do this in C.
>>
>> I need to do both unweighted & weighted histograms and we got a factor 5
>> using (simple) cython:
>> it is in the proceedings of Euroscipy, last year.
>> http://arxiv.org/pdf/1412.6367.pdf
>
>
> If I read your paper and code properly, you got 5x faster, mostly because
> you combined the weighted and unweighted histograms into a single search of
> the array, and because you used an algorithm that can only be applied to
> equal- sized bins, similarly to the 10x speed-up Robert was reporting.
>
> I think that having a special path for equal sized bins is a great idea:
> let's do it, PRs are always welcome!
> Similarly, getting the counts together with the weights seems like a very
> good idea.
>
> I also think that writing it in Python is going to take us 80% of the way
> there: most of the improvements both of you have reported are not likely to
> be coming from the language chosen, but from the algorithm used. And if C
> proves to be sufficiently faster to warrant using it, it should be confined
> to the number crunching: I don;t think there is any point in rewriting
> argument parsing in C.
>
> Also, keep in mind `np.histogram` can now handle arrays of just about
> **any** dtype. Handling that complexity in C is not a ride in the park.
> Other functions like `np.bincount` and `np.digitize` cheat by only handling
> `double` typed arrays, a luxury that histogram probably can't afford at
> this point in time.
>
> Jaime
>
> --
> (\__/)
> ( O.o)
> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
> de dominación mundial.
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20150316/cbd53a17/attachment.html>


More information about the NumPy-Discussion mailing list