[Numpy-discussion] Rewrite np.histogram in c?
ralf.gommers at gmail.com
Mon Mar 23 13:36:14 EDT 2015
On Mon, Mar 23, 2015 at 2:59 PM, Daniel da Silva <var.mail.daniel at gmail.com>
> Hope this isn't too off-topic: but it would be very nice if np.histogram
> and np.histogram2d supported masked arrays. Is this out of scope for
> outside the numpy.ma package?
Right now it looks like there's no histogram function at all for masked
arrays - would be good to improve that situation.
If it's as easy as adding to np.histogram something like:
if isinstance(a, np.ma.MaskedArray):
a = a.data[~a.mask]
then it makes sense to add that I think.
> On Mon, Mar 16, 2015 at 2:35 PM, Robert McGibbon <rmcgibbo at gmail.com>
>> It sounds like putting together a PR makes sense then. I'll try hacking
>> on this a bit.
>> On Mar 16, 2015 11:20 AM, "Jaime Fernández del Río" <jaime.frio at gmail.com>
>>> On Mon, Mar 16, 2015 at 9:28 AM, Jerome Kieffer <Jerome.Kieffer at esrf.fr>
>>>> On Mon, 16 Mar 2015 06:56:58 -0700
>>>> Jaime Fernández del Río <jaime.frio at gmail.com> wrote:
>>>> > Dispatching to a different method seems like a no brainer indeed. The
>>>> > question is whether we really need to do this in C.
>>>> I need to do both unweighted & weighted histograms and we got a factor
>>>> 5 using (simple) cython:
>>>> it is in the proceedings of Euroscipy, last year.
>>> If I read your paper and code properly, you got 5x faster, mostly
>>> because you combined the weighted and unweighted histograms into a single
>>> search of the array, and because you used an algorithm that can only be
>>> applied to equal- sized bins, similarly to the 10x speed-up Robert was
>>> I think that having a special path for equal sized bins is a great idea:
>>> let's do it, PRs are always welcome!
>>> Similarly, getting the counts together with the weights seems like a
>>> very good idea.
>>> I also think that writing it in Python is going to take us 80% of the
>>> way there: most of the improvements both of you have reported are not
>>> likely to be coming from the language chosen, but from the algorithm used.
>>> And if C proves to be sufficiently faster to warrant using it, it should be
>>> confined to the number crunching: I don;t think there is any point in
>>> rewriting argument parsing in C.
>>> Also, keep in mind `np.histogram` can now handle arrays of just about
>>> **any** dtype. Handling that complexity in C is not a ride in the park.
>>> Other functions like `np.bincount` and `np.digitize` cheat by only handling
>>> `double` typed arrays, a luxury that histogram probably can't afford at
>>> this point in time.
>>> ( O.o)
>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
>>> planes de dominación mundial.
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion at scipy.org
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion