Re: [Numpy-discussion] Rewrite np.histogram in c?

March 23, 2015


      On 2015/03/23 7:36 AM, Ralf Gommers wrote:
...
On Mon, Mar 23, 2015 at 2:59 PM, Daniel da Silva
<var.mail.daniel@gmail.com <mailto:var.mail.daniel@gmail.com>> wrote:
Hope this isn't too off-topic: but it would be very nice if
    np.histogram and np.histogram2d supported masked arrays. Is this out
    of scope for outside the numpy.ma <http://numpy.ma> package?
Right now it looks like there's no histogram function at all for masked
arrays - would be good to improve that situation.
If it's as easy as adding to np.histogram something like:
if isinstance(a, np.ma.MaskedArray):
         a = a.data[~a.mask]
It looks like it requires a little more than that, but not much.  For 
full support a new mask would need to be made from the logical_or of the 
"a" mask and the weights mask, and then used to compress both "a" and 
weights.

Eric
...
then it makes sense to add that I think.
Ralf
On Mon, Mar 16, 2015 at 2:35 PM, Robert McGibbon <rmcgibbo@gmail.com
    <mailto:rmcgibbo@gmail.com>> wrote:
Hi,
It sounds like putting together a PR makes sense then. I'll try
        hacking on this a bit.
-Robert
On Mar 16, 2015 11:20 AM, "Jaime Fernández del Río"
        <jaime.frio@gmail.com <mailto:jaime.frio@gmail.com>> wrote:
On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer
            <Jerome.Kieffer@esrf.fr <mailto:Jerome.Kieffer@esrf.fr>> wrote:
On Mon, 16 Mar 2015 06:56:58 -0700
                Jaime Fernández del Río <jaime.frio@gmail.com
                <mailto:jaime.frio@gmail.com>> wrote:
> Dispatching to a different method seems like a no brainer indeed. The
                > question is whether we really need to do this in C.
I need to do both unweighted & weighted histograms and
                we got a factor 5 using (simple) cython:
                it is in the proceedings of Euroscipy, last year.
                http://arxiv.org/pdf/1412.6367.pdf
If I read your paper and code properly, you got 5x faster,
            mostly because you combined the weighted and unweighted
            histograms into a single search of the array, and because
            you used an algorithm that can only be applied to equal-
            sized bins, similarly to the 10x speed-up Robert was reporting.
I think that having a special path for equal sized bins is a
            great idea: let's do it, PRs are always welcome!
            Similarly, getting the counts together with the weights
            seems like a very good idea.
I also think that writing it in Python is going to take us
            80% of the way there: most of the improvements both of you
            have reported are not likely to be coming from the language
            chosen, but from the algorithm used. And if C proves to be
            sufficiently faster to warrant using it, it should be
            confined to the number crunching: I don;t think there is any
            point in rewriting argument parsing in C.
Also, keep in mind `np.histogram` can now handle arrays of
            just about **any** dtype. Handling that complexity in C is
            not a ride in the park. Other functions like `np.bincount`
            and `np.digitize` cheat by only handling `double` typed
            arrays, a luxury that histogram probably can't afford at
            this point in time.
Jaime
--
            (\__/)
            ( O.o)
            ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale
            en sus planes de dominación mundial.
_______________________________________________
            NumPy-Discussion mailing list
            NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
            http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
        NumPy-Discussion mailing list
        NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
        http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
    NumPy-Discussion mailing list
    NumPy-Discussion@scipy.org <mailto:NumPy-Discussion@scipy.org>
    http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Rewrite np.histogram in c?

Eric Firing