[Numpy-discussion] What should be the result in some statistics corner cases?

Charles R Harris charlesr.harris at gmail.com
Mon Jul 15 10:47:07 EDT 2013


On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg
<sebastian at sipsolutions.net>wrote:

> On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
> >
> >
> > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> >
>
> <snip>
>
> >
> >                 For nansum, I would expect 0 even in the case of all
> >                 nans.  The point
> >                 of these functions is to simply ignore nans, correct?
> >                  So I would aim
> >                 for this behaviour:  nanfunc(x) behaves the same as
> >                 func(x[~isnan(x)])
> >
> >
> >         Agreed, although that changes current behavior. What about the
> >         other cases?
> >
> >
> >
> > Looks like there isn't much interest in the topic, so I'll just go
> > ahead with the following choices:
> >
> > Non-NaN case
> >
> > 1) Empty array -> ValueError
> >
> > The current behavior with stats is an accident, i.e., the nan arises
> > from 0/0. I like to think that in this case the result is any number,
> > rather than not a number, so *the* value is simply not defined. So in
> > this case raise a ValueError for empty array.
> >
> To be honest, I don't mind the current behaviour much sum([]) = 0,
> len([]) = 0, so it is in a way well defined. At least I am not sure if I
> would prefer always an error. I am a bit worried that just changing it
> might break code out there, such as plotting code where it makes
> perfectly sense to plot a NaN (i.e. nothing), but if that is the case it
> would probably be visible fast.
>

I'm talking about mean, var, and std as statistics, sum isn't part of that.
If there is agreement that nansum of empty arrays/columns should be zero I
will do that. Note the sums of empty arrays may or may not be empty.

In [1]: ones((0, 3)).sum(axis=0)
Out[1]: array([ 0.,  0.,  0.])

In [2]: ones((3, 0)).sum(axis=0)
Out[2]: array([], dtype=float64)

Which, sort of, makes sense.


>
> > 2) ddof >= n -> ValueError
> >
> > If the number of elements, n, is not zero and ddof >= n, raise a
> > ValueError for the ddof value.
> >
> Makes sense to me, especially for ddof > n. Just returning nan in all
> cases for backward compatibility would be fine with me too.
>
> > Nan case
> >
> > 1) Empty array -> Value Error
> > 2) Empty slice -> NaN
> > 3) For slice ddof >= n -> Nan
> >
> Personally I would somewhat prefer if 1) and 2) would at least default
> to the same thing. But I don't use the nanfuncs anyway. I was wondering
> about adding the option for the user to pick what the fill is (and i.e.
> if it is None (maybe default) -> ValueError). We could also allow this
> for normal reductions without an identity, but I am not sure if it is
> useful there.
>
>
Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130715/5b0ba3df/attachment.html>


More information about the NumPy-Discussion mailing list