[Numpy-discussion] What should be the result in some statistics corner cases?
Charles R Harris
charlesr.harris at gmail.com
Mon Jul 15 10:58:12 EDT 2013
On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg
<sebastian at sipsolutions.net>wrote:
> On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
> > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> > For nansum, I would expect 0 even in the case of all
> > nans. The point
> > of these functions is to simply ignore nans, correct?
> > So I would aim
> > for this behaviour: nanfunc(x) behaves the same as
> > func(x[~isnan(x)])
> > Agreed, although that changes current behavior. What about the
> > other cases?
> > Looks like there isn't much interest in the topic, so I'll just go
> > ahead with the following choices:
> > Non-NaN case
> > 1) Empty array -> ValueError
> > The current behavior with stats is an accident, i.e., the nan arises
> > from 0/0. I like to think that in this case the result is any number,
> > rather than not a number, so *the* value is simply not defined. So in
> > this case raise a ValueError for empty array.
> To be honest, I don't mind the current behaviour much sum() = 0,
> len() = 0, so it is in a way well defined. At least I am not sure if I
> would prefer always an error. I am a bit worried that just changing it
> might break code out there, such as plotting code where it makes
> perfectly sense to plot a NaN (i.e. nothing), but if that is the case it
> would probably be visible fast.
> > 2) ddof >= n -> ValueError
> > If the number of elements, n, is not zero and ddof >= n, raise a
> > ValueError for the ddof value.
> Makes sense to me, especially for ddof > n. Just returning nan in all
> cases for backward compatibility would be fine with me too.
Currently if ddof > n it returns a negative number for variance, the NaN
only comes when ddof == 0 and n == 0, leading to 0/0 (float is NaN, integer
is zero division).
> > Nan case
> > 1) Empty array -> Value Error
> > 2) Empty slice -> NaN
> > 3) For slice ddof >= n -> Nan
> Personally I would somewhat prefer if 1) and 2) would at least default
> to the same thing. But I don't use the nanfuncs anyway. I was wondering
> about adding the option for the user to pick what the fill is (and i.e.
> if it is None (maybe default) -> ValueError). We could also allow this
> for normal reductions without an identity, but I am not sure if it is
> useful there.
In the NaN case some slices may be empty, others not. My reasoning is that
that is going to be data dependent, not operator error, but if the array is
empty the writer of the code should deal with that.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the NumPy-Discussion