[Numpy-discussion] Rewrite np.histogram in c?

Eric Firing efiring at hawaii.edu
Mon Mar 23 14:38:29 EDT 2015


On 2015/03/23 7:36 AM, Ralf Gommers wrote:
>
>
> On Mon, Mar 23, 2015 at 2:59 PM, Daniel da Silva
> <var.mail.daniel at gmail.com <mailto:var.mail.daniel at gmail.com>> wrote:
>
>     Hope this isn't too off-topic: but it would be very nice if
>     np.histogram and np.histogram2d supported masked arrays. Is this out
>     of scope for outside the numpy.ma <http://numpy.ma> package?
>
>
> Right now it looks like there's no histogram function at all for masked
> arrays - would be good to improve that situation.
>
> If it's as easy as adding to np.histogram something like:
>
>      if isinstance(a, np.ma.MaskedArray):
>          a = a.data[~a.mask]

It looks like it requires a little more than that, but not much.  For 
full support a new mask would need to be made from the logical_or of the 
"a" mask and the weights mask, and then used to compress both "a" and 
weights.

Eric

>
> then it makes sense to add that I think.
>
> Ralf
>
>
>
>     On Mon, Mar 16, 2015 at 2:35 PM, Robert McGibbon <rmcgibbo at gmail.com
>     <mailto:rmcgibbo at gmail.com>> wrote:
>
>         Hi,
>
>         It sounds like putting together a PR makes sense then. I'll try
>         hacking on this a bit.
>
>         -Robert
>
>         On Mar 16, 2015 11:20 AM, "Jaime Fernández del Río"
>         <jaime.frio at gmail.com <mailto:jaime.frio at gmail.com>> wrote:
>
>             On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer
>             <Jerome.Kieffer at esrf.fr <mailto:Jerome.Kieffer at esrf.fr>> wrote:
>
>                 On Mon, 16 Mar 2015 06:56:58 -0700
>                 Jaime Fernández del Río <jaime.frio at gmail.com
>                 <mailto:jaime.frio at gmail.com>> wrote:
>
>                 > Dispatching to a different method seems like a no brainer indeed. The
>                 > question is whether we really need to do this in C.
>
>                 I need to do both unweighted & weighted histograms and
>                 we got a factor 5 using (simple) cython:
>                 it is in the proceedings of Euroscipy, last year.
>                 http://arxiv.org/pdf/1412.6367.pdf
>
>
>             If I read your paper and code properly, you got 5x faster,
>             mostly because you combined the weighted and unweighted
>             histograms into a single search of the array, and because
>             you used an algorithm that can only be applied to equal-
>             sized bins, similarly to the 10x speed-up Robert was reporting.
>
>             I think that having a special path for equal sized bins is a
>             great idea: let's do it, PRs are always welcome!
>             Similarly, getting the counts together with the weights
>             seems like a very good idea.
>
>             I also think that writing it in Python is going to take us
>             80% of the way there: most of the improvements both of you
>             have reported are not likely to be coming from the language
>             chosen, but from the algorithm used. And if C proves to be
>             sufficiently faster to warrant using it, it should be
>             confined to the number crunching: I don;t think there is any
>             point in rewriting argument parsing in C.
>
>             Also, keep in mind `np.histogram` can now handle arrays of
>             just about **any** dtype. Handling that complexity in C is
>             not a ride in the park. Other functions like `np.bincount`
>             and `np.digitize` cheat by only handling `double` typed
>             arrays, a luxury that histogram probably can't afford at
>             this point in time.
>
>             Jaime
>
>             --
>             (\__/)
>             ( O.o)
>             ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale
>             en sus planes de dominación mundial.
>
>             _______________________________________________
>             NumPy-Discussion mailing list
>             NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>             http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>         _______________________________________________
>         NumPy-Discussion mailing list
>         NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>         http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at scipy.org <mailto:NumPy-Discussion at scipy.org>
>     http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>




More information about the NumPy-Discussion mailing list