[Numpy-discussion] [ANN] Nanny, faster NaN functions

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Nov 19 22:51:12 EST 2010


On Fri, Nov 19, 2010 at 10:42 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
> On Fri, Nov 19, 2010 at 7:19 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Fri, Nov 19, 2010 at 1:50 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>
>>> On Fri, Nov 19, 2010 at 12:29 PM, Keith Goodman <kwgoodman at gmail.com>
>>> wrote:
>>> > On Fri, Nov 19, 2010 at 12:19 PM, Pauli Virtanen <pav at iki.fi> wrote:
>>> >> Fri, 19 Nov 2010 11:19:57 -0800, Keith Goodman wrote:
>>> >> [clip]
>>> >>> My guess is that having separate underlying functions for each dtype,
>>> >>> ndim, and axis would be a nightmare for a large project like Numpy.
>>> >>> But
>>> >>> manageable for a focused project like nanny.
>>> >>
>>> >> Might be easier to migrate the nan* functions to using Ufuncs.
>>> >>
>>> >> Unless I'm missing something,
>>> >>
>>> >>        np.nanmax -> np.fmax.reduce
>>> >>        np.nanmin -> np.fmin.reduce
>>> >>
>>> >> For `nansum`, we'd need to add an ufunc `nanadd`, and for
>>> >> `nanargmax/min`, we'd need `argfmin/fmax'.
>>> >
>>> > How about that! I wasn't aware of fmax/fmin. Yes, I'd like a nanadd,
>>> > please.
>>> >
>>> >>> arr = np.random.rand(1000, 1000)
>>> >>> arr[arr > 0.5] = np.nan
>>> >>> np.nanmax(arr)
>>> >   0.49999625409581072
>>> >>> np.fmax.reduce(arr, axis=None)
>>> > <snip>
>>> > TypeError: an integer is required
>>> >>> np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0)
>>> >   0.49999625409581072
>>> >
>>> >>> timeit np.fmax.reduce(np.fmax.reduce(arr, axis=0), axis=0)
>>> > 100 loops, best of 3: 12.7 ms per loop
>>> >>> timeit np.nanmax(arr)
>>> > 10 loops, best of 3: 39.6 ms per loop
>>> >
>>> >>> timeit np.nanmax(arr, axis=0)
>>> > 10 loops, best of 3: 46.5 ms per loop
>>> >>> timeit np.fmax.reduce(arr, axis=0)
>>> > 100 loops, best of 3: 12.7 ms per loop
>>>
>>> Cython is faster than np.fmax.reduce.
>>>
>>> I wrote a cython version of np.nanmax, called nanmax below. (It only
>>> handles the 2d, float64, axis=None case, but since the array is large
>>> I don't think that explains the time difference).
>>>
>>> Note that fmax.reduce is slower than np.nanmax when there are no NaNs:
>>>
>>> >> arr = np.random.rand(1000, 1000)
>>> >> timeit np.nanmax(arr)
>>> 100 loops, best of 3: 5.82 ms per loop
>>> >> timeit np.fmax.reduce(np.fmax.reduce(arr))
>>> 100 loops, best of 3: 9.14 ms per loop
>>> >> timeit nanmax(arr)
>>> 1000 loops, best of 3: 1.17 ms per loop
>>>
>>> >> arr[arr > 0.5] = np.nan
>>>
>>> >> timeit np.nanmax(arr)
>>> 10 loops, best of 3: 45.5 ms per loop
>>> >> timeit np.fmax.reduce(np.fmax.reduce(arr))
>>> 100 loops, best of 3: 12.7 ms per loop
>>> >> timeit nanmax(arr)
>>> 1000 loops, best of 3: 1.17 ms per loop
>>
>> There seem to be some odd hardware/compiler dependencies. I get quite a
>> different pattern of times:
>>
>> In [1]: arr = np.random.rand(1000, 1000)
>>
>> In [2]: timeit np.nanmax(arr)
>> 100 loops, best of 3: 10.4 ms per loop
>>
>> In [3]: timeit np.fmax.reduce(arr.flat)
>> 100 loops, best of 3: 2.09 ms per loop
>>
>> In [4]: arr[arr > 0.5] = np.nan
>>
>> In [5]: timeit np.nanmax(arr)
>> 100 loops, best of 3: 12.9 ms per loop
>>
>> In [6]: timeit np.fmax.reduce(arr.flat)
>> 100 loops, best of 3: 7.09 ms per loop
>>
>>
>> I've tweaked fmax with the reduce loop option but the nanmax times don't
>> look like yours at all. I'm also a bit surprised that
>> you don't see any difference in times when the array contains a lot of nans.
>> I'm running on AMD Phenom, gcc 4.4.5.
>
> Ubuntu 10.04 64 bit, numpy 1.4.1.
>
> Difference in which times? nanny.nanmax with and wintout NaNs? The
> code doesn't explictily check for NaNs (it does check for all NaNs).
> It basically loops through the data and does:
>
> allnan = 1
> ai = ai[i,k]
> if ai > amax:
>    amax = ai
>    allnan = 0

does this give you the correct answer?

>>> 1>np.nan
False

What's the starting value for amax? -inf?

Josef


>
> I should make a benchmark suite.
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list