[SciPy-User] scipy.stats.nanmedian

Fri Jan 22 16:08:23 EST 2010

On Fri, Jan 22, 2010 at 12:03 PM,  <josef.pktd at gmail.com> wrote:
> On Fri, Jan 22, 2010 at 11:52 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Fri, Jan 22, 2010 at 8:46 AM,  <josef.pktd at gmail.com> wrote:
>>> On Fri, Jan 22, 2010 at 11:09 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>> On Thu, Jan 21, 2010 at 8:18 PM,  <josef.pktd at gmail.com> wrote:
>>>>> On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>>>> On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>>>>>>> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote:
>>>>>>>> That's the only was I was able to figure out how to pull 1.0 out of
>>>>>>>> np.array(1.0). Is there a better way?
>>>>>>>
>>>>>>>
>>>>>>> .item()
>>>>>>
>>>>>> Thanks. item() looks better than tolist().
>>>>>>
>>>>>> I simplified the function:
>>>>>>
>>>>>> def nanmedian(x, axis=0):
>>>>>>    x, axis = _chk_asarray(x,axis)
>>>>>>    if x.ndim == 0:
>>>>>>        return float(x.item())
>>>>>>    x = x.copy()
>>>>>>    x = np.apply_along_axis(_nanmedian,axis,x)
>>>>>>    if x.ndim == 0:
>>>>>>        x = float(x.item())
>>>>>>    return x
>>>>>>
>>>>>> and opened a ticket:
>>>>>>
>>>>>> http://projects.scipy.org/scipy/ticket/1098
>>>>>
>>>>>
>>>>> How about getting rid of apply_along_axis?    see attachment
>>>>>
>>>>> I don't know whether or how much faster it is, but there is a ticket
>>>>> that the current version is slow.
>>>>> No hidden bug or corner case guarantee yet.
>>>>
>>>> It is faster. But here is one case it does not handle:
>>>>
>>>>>> nanmedian([1, 2])
>>>>   array([ 1.5])
>>>>>> np.median([1, 2])
>>>>   1.5
>>>>
>>>> I'm sure it could be fixed. But having to fix it, and the fact that it
>>>> is a larger change, decreases the likelihood that it will make it into
>>>> the next version of scipy. One option is to make the small bug fix I
>>>> suggested (ticket #1098) and add the corresponding unit tests. Then we
>>>> can take our time to design a better version of nanmedian.
>>>
>>> I didn't see the difference to np.median for this case, I think I was
>>> taking the shape answer from the other thread on the return of splines
>>> and interpolation.
>>>
>>> If I change the last 3 lines to
>>>    if nanmed.size == 1:
>>>       return nanmed.item()
>>>    return nanmed
>>>
>>> then I get agreement with numpy for the following test cases
>>>
>>> print nanmedian(1), np.median(1)
>>> print nanmedian(np.array(1)), np.median(1)
>>> print nanmedian(np.array([1])), np.median(np.array([1]))
>>> print nanmedian(np.array([[1]])), np.median(np.array([[1]]))
>>> print nanmedian(np.array([1,2])), np.median(np.array([1,2]))
>>> print nanmedian(np.array([[1,2]])), np.median(np.array([[1,2]]),axis=0)
>>> print nanmedian([1]), np.median([1])
>>> print nanmedian([[1]]), np.median([[1]])
>>> print nanmedian([1,2]), np.median([1,2])
>>> print nanmedian([[1,2]]), np.median([[1,2]],axis=0)
>>> print nanmedian([1j,2]), np.median([1j,2])
>>>
>>> Am I still missing any cases?
>>>
>>> The vectorized version should be faster for this case
>>> http://projects.scipy.org/scipy/ticket/740
>>> but maybe not for long and narrow arrays.
>>
>> Here is an odd one:
>>
>>>> nanmedian(True)
>>   1.0
>>>> nanmedian([True])
>>   0.5  # <--- strange
>>
>>>> np.median(True)
>>   1.0
>>>> np.median([True])
>>   1.0
>
> definitely weird
>
>>>> (np.array(True)+np.array(True))/2.
> 0.5
>>>> np.array([True, True]).sum()
> 2
>>>> np.array([True, True]).mean()
> 1.0
>
> I assumed mean (is used by np.ma.median) is the same as adding and dividing by 2
>
> Josef
>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>

It got a bit ugly, too many shapes for a single number, and tricky
axis handling.
Your idea of making just the smallish changes looks more attractive now.

and almost correct

>>> np.median(np.array([[[1]]]),axis=0).shape
(1, 1)
>>> nanmedian(np.array([[[1]]]),axis=0).shape
(1, 1)
>>> np.median(np.array([[[1]]]),axis=-1).shape
(1, 1)
>>> nanmedian(np.array([[[1]]]),axis=-1).shape
(1, 1)

but there slipped a python in

>>> nanmedian(np.array([[[1]]]),axis=None).__class__
<type 'float'>
>>> np.median(np.array([[[1]]]),axis=None).__class__
<type 'numpy.float64'>

.item() returns a python number not a numpy number

>>> np.array([[[1]]]).item().__class__
<type 'int'>
>>> np.array([[[1]]]).flat[0].__class__
<type 'numpy.int32'>

I also didn't know this:

>>> None < 0
True

Josef
-------------- next part --------------
# -*- coding: utf-8 -*-
"""
Created on Wed Jan 20 10:18:32 2010
Author: josef-pktd
"""

import numpy as np
from scipy import stats

def nanmedian(x, axis = 0):
    keepshape = list(np.shape(x))
    x, axis2 = stats.stats._chk_asarray(x, axis)
    if (not axis is None) and axis<0 : # and x.ndim>2:
        axis = x.ndim + axis
    #print 'axis', axis

    #print x, keepshape
    if x.ndim == 0 or (x.size==1 and axis is None):
        return 1.0*x.item() 
    if keepshape and not axis is None :
        keepshape.pop(axis)
    if x.size == 1: #x.ndim == 0:
        return 1.0*x.reshape(keepshape)
    if x.dtype == np.bool:
        x = x.astype(int)
    if axis is None:
        axis = 0

    x = np.sort(x, axis=axis)
    nall = x.shape[axis]
    notnancount = nall - np.isnan(x).sum(axis=axis)
    #print 'notnancount', notnancount

    (idx, rmd) = divmod(notnancount, 2)
    #idx = np.atleast_1d(idx)
    #print 'idx', idx
    #print idx.shape
    indx = np.ogrid[[slice(i)for i in x.shape]]
    indxlo = indx[:]
    idxslice = map(slice, idx.shape)
    idxslice.insert(axis, None)
    #print idxslice
    idx = np.atleast_1d(idx)
    indxlo[axis] = idx[idxslice]
    #print indxlo
    indxhi = indx[:]
    indxhi[axis] = (idx - (1-rmd))[idxslice]#[idx.shape[:axis]+(None,)+idx.shape[axis:]]
    #print map(np.shape, indxhi)
    #print indxhi
    nanmed = (x[indxlo] + x[indxhi])/2.

    #print 'keepshape, nanmed.shape',keepshape, nanmed.shape
    if np.ndim == 0:
        return nanmed.item()
    #return nanmed.reshape(keepshape)
    return np.squeeze(nanmed) #.reshape(keepshape)
    if nanmed.size == 1:
        return nanmed.reshape(keepshape)
        return nanmed.item()
    return nanmed

from numpy.testing import assert_equal, assert_almost_equal

for axis in [0,1, None, -1]:
    for i in range(5):
        # for complex
        #x = 1j+np.arange(20).reshape(4,5)
        x = np.arange(20).reshape(4,5).astype(float)
        x[zip(np.random.randint(4, size=(2,5)))] = np.nan
        assert_equal(nanmedian(x, axis=0), stats.nanmedian(x, axis=0))

for axis in [0,1,2, None, -1]:
    for i in range(5):
        x = np.arange(3*4*5).reshape(3,4,5).astype(float)
        x[np.random.randint(3, size=(3,4,5))] = np.nan
        assert_equal(nanmedian(x, axis=0), stats.nanmedian(x, axis=0))

xli = [ [1], [[1]], [1,2], [1j], [1j,2], [True], [False],
       [True,False], [True, False, True], np.round(np.random.randn(2,4,5),4)]
xxli = xli + map(np.array,xli)

for axis in [0, -1, None]:
    print '\n next case',axis
    for x in xxli:
        try:
            assert_equal(nanmedian(x, axis=axis), np.median(x, axis=axis))
            assert_equal(np.shape(nanmedian(x, axis=axis)), np.shape(np.median(x, axis=axis)))
        except:
            print 'failure with', x
            print nanmedian(x, axis=axis),  np.median(x, axis=axis)
            raise

y=np.round(np.random.randn(2,3,5),4)
axis = -1 #None
print np.median(y,axis=axis)
nm = nanmedian(y,axis=axis)
print nm
print np.median(y,axis=axis) - nm

#np.broadcast(array([[[0]],[[1]]]), array([[[1, 1, 1, 1, 1]], [[1, 1, 1, 1, 1]]]), array([[[0, 1, 2, 3, 4]]]))