[Numpy-discussion] the mean, var, std of empty arrays

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Nov 21 23:20:14 EST 2012


On Wed, Nov 21, 2012 at 10:58 PM,  <josef.pktd at gmail.com> wrote:
> On Wed, Nov 21, 2012 at 10:35 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>> On Wed, Nov 21, 2012 at 7:45 PM, <josef.pktd at gmail.com> wrote:
>>>
>>> On Wed, Nov 21, 2012 at 9:22 PM, Olivier Delalleau <shish at keba.be> wrote:
>>> > Current behavior looks sensible to me. I personally would prefer no
>>> > warning
>>> > but I think it makes sense to have one as it can be helpful to detect
>>> > issues
>>> > faster.
>>>
>>> I agree that nan should be the correct answer.
>>> (I gave up trying to define a default for 0/0 in scipy.stats ttests.)
>>>
>>> some funnier cases
>>>
>>> >>> np.var([1], ddof=1)
>>> 0.0
>>
>>
>> This one is a nan in development.
>>
>>>
>>> >>> np.var([1], ddof=5)
>>> -0
>>> >>> np.var([1,2], ddof=5)
>>> -0.16666666666666666
>>> >>> np.std([1,2], ddof=5)
>>> nan
>>>
>>
>> These still do this. Also
>>
>> In [10]: var([], ddof=1)
>> Out[10]: -0
>>
>> Which suggests that the nan is pretty much an accidental byproduct of
>> division by zero. I think it might make sense to have a definite policy for
>> these corner cases.
>
> It would also be consistent with the usual pattern to raise a
> ValueError on this. ddof too large, size too small.
> It wouldn't be the case that for some columns or rows we get valid
> answers in this case, as long as we don't allow for missing values.

I think I prefer NaNs to an exception, they propagate nicer to
downstream functions.

I'm in favor of a policy instead of nans or wrong numbers by accident.

>
>
> quick check with np.ma
>
> looks correct except when delegating to numpy ?
>
>>>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=5, axis=0)
>>>> s
> masked_array(data = [-- --],
>              mask = [ True  True],
>        fill_value = 1e+20)
>
>>>> s = np.ma.var(np.ma.masked_invalid([[1.,2],[1,np.nan]]), ddof=1, axis=0)
>>>> s
> masked_array(data = [0.0 --],
>              mask = [False  True],
>        fill_value = 1e+20)
>
>>>> s = np.ma.std([1,2], ddof=5)
>>>> s
> masked
>>>> type(s)
> <class 'numpy.ma.core.MaskedConstant'>
>
>>>> np.ma.var([1,2], ddof=5)
> -0.16666666666666666

and cov:

>>> np.cov([1.],[3],bias=True, rowvar=False)   #looks fine
array([[ 0.,  0.],
       [ 0.,  0.]])
>>> np.cov([1.],[3],bias=False, rowvar=False)
array([[ nan,  nan],
       [ nan,  nan]])

>>> np.cov([[1.],[3]],bias=False, rowvar=True)
array([[ nan,  nan],
       [ nan,  nan]])

>>> np.cov([],[],bias=False, rowvar=False)     #should be nan
array([[-0., -0.],
       [-0., -0.]])
>>> np.cov([],[],bias=True, rowvar=False)
array([[ nan,  nan],
       [ nan,  nan]])


np.corrcoef seems to have nans in the right places in the examples I tried

Josef

>
>
> Josef
>
>>
>> <snip>
>>
>> Chuck
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>



More information about the NumPy-Discussion mailing list