Hope this isn't too off-topic: but it would be very nice if np.histogram and np.histogram2d supported masked arrays. Is this out of scope for outside the numpy.ma package?

On Mon, Mar 16, 2015 at 2:35 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:

Hi,

It sounds like putting together a PR makes sense then. I'll try hacking on this a bit.

-Robert

On Mar 16, 2015 11:20 AM, "Jaime Fernández del Río" <jaime.frio@gmail.com> wrote:
On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer <Jerome.Kieffer@esrf.fr> wrote:
On Mon, 16 Mar 2015 06:56:58 -0700
Jaime Fernández del Río <jaime.frio@gmail.com> wrote:

> Dispatching to a different method seems like a no brainer indeed. The
> question is whether we really need to do this in C.

I need to do both unweighted & weighted histograms and we got a factor 5 using (simple) cython:
it is in the proceedings of Euroscipy, last year.
http://arxiv.org/pdf/1412.6367.pdf

If I read your paper and code properly, you got 5x faster, mostly because you combined the weighted and unweighted histograms into a single search of the array, and because you used an algorithm that can only be applied to equal- sized bins, similarly to the 10x speed-up Robert was reporting.

I think that having a special path for equal sized bins is a great idea: let's do it, PRs are always welcome!
Similarly, getting the counts together with the weights seems like a very good idea.

I also think that writing it in Python is going to take us 80% of the way there: most of the improvements both of you have reported are not likely to be coming from the language chosen, but from the algorithm used. And if C proves to be sufficiently faster to warrant using it, it should be confined to the number crunching: I don;t think there is any point in rewriting argument parsing in C.

Also, keep in mind `np.histogram` can now handle arrays of just about **any** dtype. Handling that complexity in C is not a ride in the park. Other functions like `np.bincount` and `np.digitize` cheat by only handling `double` typed arrays, a luxury that histogram probably can't afford at this point in time.

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion