[Numpy-discussion] What should be the result in some statistics corner cases?

Charles R Harris charlesr.harris at gmail.com
Mon Jul 15 09:52:15 EDT 2013


On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris <charlesr.harris at gmail.com
> wrote:

>
>
> On Sun, Jul 14, 2013 at 2:55 PM, Warren Weckesser <
> warren.weckesser at gmail.com> wrote:
>
>> On 7/14/13, Charles R Harris <charlesr.harris at gmail.com> wrote:
>> > Some corner cases in the mean, var, std.
>> >
>> > *Empty arrays*
>> >
>> > I think these cases should either raise an error or just return nan.
>> > Warnings seem ineffective to me as they are only issued once by default.
>> >
>> > In [3]: ones(0).mean()
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:61:
>> > RuntimeWarning: invalid value encountered in double_scalars
>> >   ret = ret / float(rcount)
>> > Out[3]: nan
>> >
>> > In [4]: ones(0).var()
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
>> > RuntimeWarning: invalid value encountered in true_divide
>> >   out=arrmean, casting='unsafe', subok=False)
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>> > RuntimeWarning: invalid value encountered in double_scalars
>> >   ret = ret / float(rcount)
>> > Out[4]: nan
>> >
>> > In [5]: ones(0).std()
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:76:
>> > RuntimeWarning: invalid value encountered in true_divide
>> >   out=arrmean, casting='unsafe', subok=False)
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>> > RuntimeWarning: invalid value encountered in double_scalars
>> >   ret = ret / float(rcount)
>> > Out[5]: nan
>> >
>> > *ddof >= number of elements*
>> >
>> > I think these should just raise errors. The results for ddof >=
>> #elements
>> > is happenstance, and certainly negative numbers should never be
>> returned.
>> >
>> > In [6]: ones(2).var(ddof=2)
>> >
>> /home/charris/.local/lib/python2.7/site-packages/numpy/core/_methods.py:100:
>> > RuntimeWarning: invalid value encountered in double_scalars
>> >   ret = ret / float(rcount)
>> > Out[6]: nan
>> >
>> > In [7]: ones(2).var(ddof=3)
>> > Out[7]: -0.0
>> > *
>> > nansum*
>> >
>> > Currently returns nan for empty arrays. I suspect it should return nan
>> for
>> > slices that are all nan, but 0 for empty slices. That would make it
>> > consistent with sum in the empty case.
>> >
>>
>>
>> For nansum, I would expect 0 even in the case of all nans.  The point
>> of these functions is to simply ignore nans, correct?  So I would aim
>> for this behaviour:  nanfunc(x) behaves the same as func(x[~isnan(x)])
>>
>>
> Agreed, although that changes current behavior. What about the other
> cases?
>
>
Looks like there isn't much interest in the topic, so I'll just go ahead
with the following choices:

Non-NaN case

1) Empty array -> ValueError

The current behavior with stats is an accident, i.e., the nan arises from
0/0. I like to think that in this case the result is any number, rather
than not a number, so *the* value is simply not defined. So in this case
raise a ValueError for empty array.

2) ddof >= n -> ValueError

If the number of elements, n, is not zero and ddof >= n, raise a ValueError
for the ddof value.

Nan case

1) Empty array -> Value Error
2) Empty slice -> NaN
3) For slice ddof >= n -> Nan

 Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130715/6c65d9d2/attachment.html>


More information about the NumPy-Discussion mailing list