[Numpy-discussion] What should be the result in some statistics corner cases?
sebastian at sipsolutions.net
Mon Jul 15 11:55:44 EDT 2013
On Mon, 2013-07-15 at 08:47 -0600, Charles R Harris wrote:
> On Mon, Jul 15, 2013 at 8:34 AM, Sebastian Berg
> <sebastian at sipsolutions.net> wrote:
> On Mon, 2013-07-15 at 07:52 -0600, Charles R Harris wrote:
> > On Sun, Jul 14, 2013 at 3:35 PM, Charles R Harris
> > <charlesr.harris at gmail.com> wrote:
> > For nansum, I would expect 0 even in the
> case of all
> > nans. The point
> > of these functions is to simply ignore nans,
> > So I would aim
> > for this behaviour: nanfunc(x) behaves the
> same as
> > func(x[~isnan(x)])
> > Agreed, although that changes current behavior. What
> about the
> > other cases?
> > Looks like there isn't much interest in the topic, so I'll
> just go
> > ahead with the following choices:
> > Non-NaN case
> > 1) Empty array -> ValueError
> > The current behavior with stats is an accident, i.e., the
> nan arises
> > from 0/0. I like to think that in this case the result is
> any number,
> > rather than not a number, so *the* value is simply not
> defined. So in
> > this case raise a ValueError for empty array.
> To be honest, I don't mind the current behaviour much sum()
> = 0,
> len() = 0, so it is in a way well defined. At least I am not
> sure if I
> would prefer always an error. I am a bit worried that just
> changing it
> might break code out there, such as plotting code where it
> perfectly sense to plot a NaN (i.e. nothing), but if that is
> the case it
> would probably be visible fast.
> I'm talking about mean, var, and std as statistics, sum isn't part of
> that. If there is agreement that nansum of empty arrays/columns should
> be zero I will do that. Note the sums of empty arrays may or may not
> be empty.
> In : ones((0, 3)).sum(axis=0)
> Out: array([ 0., 0., 0.])
> In : ones((3, 0)).sum(axis=0)
> Out: array(, dtype=float64)
> Which, sort of, makes sense.
I think we can agree that the behaviour for reductions with an identity
should default to returning the identity, including for the nanfuncs,
i.e. sum() is 0, product() is 1...
Since mean = sum/length is a sensible definition, having 0/0 as a result
doesn't seem to bad to me to be honest, it might be accidental but it is
not a special case in the code ;). Though I don't mind an error as long
as it doesn't break matplotlib or so.
I agree about the nanfuncs raising an error would probably be more of a
problem then for a usual ufunc, but still a bit hesitant about saying
that it is ok too. I could imagine adding a very general "identity"
argument (though I would not call it identity, because it is not the
same as `np.add.identity`, just used in a place where that would be used
np.add.reduce(, identity=123) -> 
np.add.reduce(, identity=123) -> 
np.nanmean([np.nan], identity=None) -> Error
np.nanmean([np.nan], identity=np.nan) -> np.nan
It doesn't really make sense, but:
np.subtract.reduce() -> Error, since np.substract.identity is None
np.subtract.reduce(, identity=0) -> 0, suppressing the error.
I am not sure if I am convinced myself, but especially for the nanfuncs
it could maybe provide a way to circumvent the problem somewhat.
Including functions such as np.nanargmin, whose result type does not
even support NaN. Plus it gives an argument allowing for warnings about
> > 2) ddof >= n -> ValueError
> > If the number of elements, n, is not zero and ddof >= n,
> raise a
> > ValueError for the ddof value.
> Makes sense to me, especially for ddof > n. Just returning nan
> in all
> cases for backward compatibility would be fine with me too.
> > Nan case
> > 1) Empty array -> Value Error
> > 2) Empty slice -> NaN
> > 3) For slice ddof >= n -> Nan
> Personally I would somewhat prefer if 1) and 2) would at least
> to the same thing. But I don't use the nanfuncs anyway. I was
> about adding the option for the user to pick what the fill is
> (and i.e.
> if it is None (maybe default) -> ValueError). We could also
> allow this
> for normal reductions without an identity, but I am not sure
> if it is
> useful there.
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
More information about the NumPy-Discussion