[Numpy-discussion] [ANN] Nanny, faster NaN functions

Wes McKinney wesmckinn at gmail.com
Sat Nov 20 18:54:32 EST 2010


On Sat, Nov 20, 2010 at 6:39 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Fri, Nov 19, 2010 at 7:42 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> I should make a benchmark suite.
>
>>> ny.benchit(verbose=False)
> Nanny performance benchmark
>    Nanny 0.0.1dev
>    Numpy 1.4.1
>    Speed is numpy time divided by nanny time
>    NaN means all NaNs
>   Speed   Test                Shape        dtype    NaN?
>   6.6770  nansum(a, axis=-1)  (500,500)    int64
>   4.6612  nansum(a, axis=-1)  (10000,)     float64
>   9.0351  nansum(a, axis=-1)  (500,500)    int32
>   3.0746  nansum(a, axis=-1)  (500,500)    float64
>  11.5740  nansum(a, axis=-1)  (10000,)     int32
>   6.4484  nansum(a, axis=-1)  (10000,)     int64
>  51.3917  nansum(a, axis=-1)  (500,500)    float64  NaN
>  13.8692  nansum(a, axis=-1)  (10000,)     float64  NaN
>   6.5327  nanmax(a, axis=-1)  (500,500)    int64
>   8.8222  nanmax(a, axis=-1)  (10000,)     float64
>   0.2059  nanmax(a, axis=-1)  (500,500)    int32
>   6.9262  nanmax(a, axis=-1)  (500,500)    float64
>   5.0688  nanmax(a, axis=-1)  (10000,)     int32
>   6.5605  nanmax(a, axis=-1)  (10000,)     int64
>  48.4850  nanmax(a, axis=-1)  (500,500)    float64  NaN
>  14.6289  nanmax(a, axis=-1)  (10000,)     float64  NaN
>
> You can also use the makefile to run the benchmark: make bench
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

Keith (and others),

What would you think about creating a library of mostly Cython-based
"domain specific functions"? So stuff like rolling statistical
moments, nan* functions like you have here, and all that-- NumPy-array
only functions that don't necessarily belong in NumPy or SciPy (but
could be included on down the road). You were already talking about
this on the statsmodels mailing list for larry. I spent a lot of time
writing a bunch of these for pandas over the last couple of years, and
I would have relatively few qualms about moving these outside of
pandas and introducing a dependency. You could do the same for larry--
then we'd all be relying on the same well-vetted and tested codebase.

- Wes



More information about the NumPy-Discussion mailing list