[Numpy-discussion] What should be the result in some statistics corner cases?
Benjamin Root
ben.root at ou.edu
Mon Jul 15 10:25:08 EDT 2013
This is going to need to be heavily documented with doctests. Also, just to
clarify, are we talking about a ValueError for doing a nansum on an empty
array as well, or will that now return a zero?
Ben Root
On Mon, Jul 15, 2013 at 9:52 AM, Charles R Harris <charlesr.harris at gmail.com
> wrote:
>
>
> On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris <
> charlesr.harris at gmail.com> wrote:
>
>>
>>
>> On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser <
>> warren.weckesser at gmail.com> wrote:
>>
>>> On 7/14/13, Charles R Harris <charlesr.harris at gmail.com> wrote:
>>> > Some corner cases in the mean, var, std.
>>> >
>>> > *Empty arrays*
>>> >
>>> > I think these cases should either raise an error or just return nan.
>>> > Warnings seem ineffective to me as they are only issued once by
>>> default.
>>> >
>>> > In [3]: ones(0).mean()
>>> >
>>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61:
>>> > RuntimeWarning: invalid value encountered in double_scalars
>>> > ret = ret / float(rcount)
>>> > Out[3]: nan
>>> >
>>> > In [4]: ones(0).var()
>>> >
>>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
>>> > RuntimeWarning: invalid value encountered in true_divide
>>> > out=arrmean, casting='unsafe', subok=False)
>>> >
>>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>>> > RuntimeWarning: invalid value encountered in double_scalars
>>> > ret = ret / float(rcount)
>>> > Out[4]: nan
>>> >
>>> > In [5]: ones(0).std()
>>> >
>>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
>>> > RuntimeWarning: invalid value encountered in true_divide
>>> > out=arrmean, casting='unsafe', subok=False)
>>> >
>>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>>> > RuntimeWarning: invalid value encountered in double_scalars
>>> > ret = ret / float(rcount)
>>> > Out[5]: nan
>>> >
>>> > *ddof >= number of elements*
>>> >
>>> > I think these should just raise errors. The results for ddof >=
>>> #elements
>>> > is happenstance, and certainly negative numbers should never be
>>> returned.
>>> >
>>> > In [6]: ones(2).var(ddof=2)
>>> >
>>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>>> > RuntimeWarning: invalid value encountered in double_scalars
>>> > ret = ret / float(rcount)
>>> > Out[6]: nan
>>> >
>>> > In [7]: ones(2).var(ddof=3)
>>> > Out[7]: -0.0
>>> > *
>>> > nansum*
>>> >
>>> > Currently returns nan for empty arrays. I suspect it should return nan
>>> for
>>> > slices that are all nan, but 0 for empty slices. That would make it
>>> > consistent with sum in the empty case.
>>> >
>>>
>>>
>>> For nansum, I would expect 0 even in the case of all nans. The point
>>> of these functions is to simply ignore nans, correct? So I would aim
>>> for this behaviour: nanfunc(x) behaves the same as func(x[~isnan(x)])
>>>
>>>
>> Agreed, although that changes current behavior. What about the other
>> cases?
>>
>>
> Looks like there isn't much interest in the topic, so I'll just go ahead
> with the following choices:
>
> Non-NaN case
>
> 1) Empty array -> ValueError
>
> The current behavior with stats is an accident, i.e., the nan arises from
> 0/0. I like to think that in this case the result is any number, rather
> than not a number, so *the* value is simply not defined. So in this case
> raise a ValueError for empty array.
>
> 2) ddof >= n -> ValueError
>
> If the number of elements, n, is not zero and ddof >= n, raise a
> ValueError for the ddof value.
>
> Nan case
>
> 1) Empty array -> Value Error
> 2) Empty slice -> NaN
> 3) For slice ddof >= n -> Nan
>
> Chuck
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130715/dfaf44e8/attachment.html>
More information about the NumPy-Discussion
mailing list