On Mon, Mar 23, 2015 at 2:59 PM, Daniel da Silva <var.mail.daniel@gmail.com> wrote:
Hope this isn't too off-topic: but it would be very nice if np.histogram and np.histogram2d supported masked arrays. Is this out of scope for outside the numpy.ma package?

Right now it looks like there's no histogram function at all for masked arrays - would be good to improve that situation.

If it's as easy as adding to np.histogram something like:

    if isinstance(a, np.ma.MaskedArray):
        a = a.data[~a.mask]

then it makes sense to add that I think.

Ralf



On Mon, Mar 16, 2015 at 2:35 PM, Robert McGibbon <rmcgibbo@gmail.com> wrote:

Hi,

It sounds like putting together a PR makes sense then. I'll try hacking on this a bit.

-Robert

On Mar 16, 2015 11:20 AM, "Jaime Fernández del Río" <jaime.frio@gmail.com> wrote:
On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer <Jerome.Kieffer@esrf.fr> wrote:
On Mon, 16 Mar 2015 06:56:58 -0700
Jaime Fernández del Río <jaime.frio@gmail.com> wrote:

> Dispatching to a different method seems like a no brainer indeed. The
> question is whether we really need to do this in C.

I need to do both unweighted & weighted histograms and we got a factor 5 using (simple) cython:
it is in the proceedings of Euroscipy, last year.
http://arxiv.org/pdf/1412.6367.pdf

If I read your paper and code properly, you got 5x faster, mostly because you combined the weighted and unweighted histograms into a single search of the array, and because you used an algorithm that can only be applied to equal- sized bins, similarly to the 10x speed-up Robert was reporting.

I think that having a special path for equal sized bins is a great idea: let's do it, PRs are always welcome!
Similarly, getting the counts together with the weights seems like a very good idea.

I also think that writing it in Python is going to take us 80% of the way there: most of the improvements both of you have reported are not likely to be coming from the language chosen, but from the algorithm used. And if C proves to be sufficiently faster to warrant using it, it should be confined to the number crunching: I don;t think there is any point in rewriting argument parsing in C.

Also, keep in mind `np.histogram` can now handle arrays of just about **any** dtype. Handling that complexity in C is not a ride in the park. Other functions like `np.bincount` and `np.digitize` cheat by only handling `double` typed arrays, a luxury that histogram probably can't afford at this point in time.

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion