Re: [Numpy-discussion] Rewrite np.histogram in c?

March 23, 2015

      On Mon, Mar 23, 2015 at 2:59 PM, Daniel da Silva <var.mail.daniel@gmail.com>
wrote:
...
Hope this isn't too off-topic: but it would be very nice if np.histogram
and np.histogram2d supported masked arrays. Is this out of scope for
outside the numpy.ma package?
Right now it looks like there's no histogram function at all for masked
arrays - would be good to improve that situation.

If it's as easy as adding to np.histogram something like:

    if isinstance(a, np.ma.MaskedArray):
        a = a.data[~a.mask]

then it makes sense to add that I think.

Ralf
...
On Mon, Mar 16, 2015 at 2:35 PM, Robert McGibbon <rmcgibbo@gmail.com>
wrote:
...
Hi,
It sounds like putting together a PR makes sense then. I'll try hacking
on this a bit.
-Robert
On Mar 16, 2015 11:20 AM, "Jaime Fernández del Río" <jaime.frio@gmail.com>
wrote:
...
On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer <Jerome.Kieffer@esrf.fr>
wrote:
...
On Mon, 16 Mar 2015 06:56:58 -0700
Jaime Fernández del Río <jaime.frio@gmail.com> wrote:
...
Dispatching to a different method seems like a no brainer indeed. The
question is whether we really need to do this in C.
I need to do both unweighted & weighted histograms and we got a factor
5 using (simple) cython:
it is in the proceedings of Euroscipy, last year.
http://arxiv.org/pdf/1412.6367.pdf
If I read your paper and code properly, you got 5x faster, mostly
because you combined the weighted and unweighted histograms into a single
search of the array, and because you used an algorithm that can only be
applied to equal- sized bins, similarly to the 10x speed-up Robert was
reporting.
I think that having a special path for equal sized bins is a great idea:
let's do it, PRs are always welcome!
Similarly, getting the counts together with the weights seems like a
very good idea.
I also think that writing it in Python is going to take us 80% of the
way there: most of the improvements both of you have reported are not
likely to be coming from the language chosen, but from the algorithm used.
And if C proves to be sufficiently faster to warrant using it, it should be
confined to the number crunching: I don;t think there is any point in
rewriting argument parsing in C.
Also, keep in mind `np.histogram` can now handle arrays of just about
**any** dtype. Handling that complexity in C is not a ride in the park.
Other functions like `np.bincount` and `np.digitize` cheat by only handling
`double` typed arrays, a luxury that histogram probably can't afford at
this point in time.
Jaime
--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
planes de dominación mundial.
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Rewrite np.histogram in c?

Ralf Gommers