[SciPy-User] scipy.stats.nanmedian
josef.pktd at gmail.com
josef.pktd at gmail.com
Fri Jan 22 16:08:23 EST 2010
On Fri, Jan 22, 2010 at 12:03 PM, <josef.pktd at gmail.com> wrote:
> On Fri, Jan 22, 2010 at 11:52 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Fri, Jan 22, 2010 at 8:46 AM, <josef.pktd at gmail.com> wrote:
>>> On Fri, Jan 22, 2010 at 11:09 AM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>> On Thu, Jan 21, 2010 at 8:18 PM, <josef.pktd at gmail.com> wrote:
>>>>> On Thu, Jan 21, 2010 at 10:01 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>>>>>> On Thu, Jan 21, 2010 at 6:41 PM, Pierre GM <pgmdevlist at gmail.com> wrote:
>>>>>>> On Jan 21, 2010, at 9:28 PM, Keith Goodman wrote:
>>>>>>>> That's the only was I was able to figure out how to pull 1.0 out of
>>>>>>>> np.array(1.0). Is there a better way?
>>>>>>>
>>>>>>>
>>>>>>> .item()
>>>>>>
>>>>>> Thanks. item() looks better than tolist().
>>>>>>
>>>>>> I simplified the function:
>>>>>>
>>>>>> def nanmedian(x, axis=0):
>>>>>> x, axis = _chk_asarray(x,axis)
>>>>>> if x.ndim == 0:
>>>>>> return float(x.item())
>>>>>> x = x.copy()
>>>>>> x = np.apply_along_axis(_nanmedian,axis,x)
>>>>>> if x.ndim == 0:
>>>>>> x = float(x.item())
>>>>>> return x
>>>>>>
>>>>>> and opened a ticket:
>>>>>>
>>>>>> http://projects.scipy.org/scipy/ticket/1098
>>>>>
>>>>>
>>>>> How about getting rid of apply_along_axis? see attachment
>>>>>
>>>>> I don't know whether or how much faster it is, but there is a ticket
>>>>> that the current version is slow.
>>>>> No hidden bug or corner case guarantee yet.
>>>>
>>>> It is faster. But here is one case it does not handle:
>>>>
>>>>>> nanmedian([1, 2])
>>>> array([ 1.5])
>>>>>> np.median([1, 2])
>>>> 1.5
>>>>
>>>> I'm sure it could be fixed. But having to fix it, and the fact that it
>>>> is a larger change, decreases the likelihood that it will make it into
>>>> the next version of scipy. One option is to make the small bug fix I
>>>> suggested (ticket #1098) and add the corresponding unit tests. Then we
>>>> can take our time to design a better version of nanmedian.
>>>
>>> I didn't see the difference to np.median for this case, I think I was
>>> taking the shape answer from the other thread on the return of splines
>>> and interpolation.
>>>
>>> If I change the last 3 lines to
>>> if nanmed.size == 1:
>>> return nanmed.item()
>>> return nanmed
>>>
>>> then I get agreement with numpy for the following test cases
>>>
>>> print nanmedian(1), np.median(1)
>>> print nanmedian(np.array(1)), np.median(1)
>>> print nanmedian(np.array([1])), np.median(np.array([1]))
>>> print nanmedian(np.array([[1]])), np.median(np.array([[1]]))
>>> print nanmedian(np.array([1,2])), np.median(np.array([1,2]))
>>> print nanmedian(np.array([[1,2]])), np.median(np.array([[1,2]]),axis=0)
>>> print nanmedian([1]), np.median([1])
>>> print nanmedian([[1]]), np.median([[1]])
>>> print nanmedian([1,2]), np.median([1,2])
>>> print nanmedian([[1,2]]), np.median([[1,2]],axis=0)
>>> print nanmedian([1j,2]), np.median([1j,2])
>>>
>>> Am I still missing any cases?
>>>
>>> The vectorized version should be faster for this case
>>> http://projects.scipy.org/scipy/ticket/740
>>> but maybe not for long and narrow arrays.
>>
>> Here is an odd one:
>>
>>>> nanmedian(True)
>> 1.0
>>>> nanmedian([True])
>> 0.5 # <--- strange
>>
>>>> np.median(True)
>> 1.0
>>>> np.median([True])
>> 1.0
>
> definitely weird
>
>>>> (np.array(True)+np.array(True))/2.
> 0.5
>>>> np.array([True, True]).sum()
> 2
>>>> np.array([True, True]).mean()
> 1.0
>
> I assumed mean (is used by np.ma.median) is the same as adding and dividing by 2
>
> Josef
>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>
It got a bit ugly, too many shapes for a single number, and tricky
axis handling.
Your idea of making just the smallish changes looks more attractive now.
and almost correct
>>> np.median(np.array([[[1]]]),axis=0).shape
(1, 1)
>>> nanmedian(np.array([[[1]]]),axis=0).shape
(1, 1)
>>> np.median(np.array([[[1]]]),axis=-1).shape
(1, 1)
>>> nanmedian(np.array([[[1]]]),axis=-1).shape
(1, 1)
but there slipped a python in
>>> nanmedian(np.array([[[1]]]),axis=None).__class__
<type 'float'>
>>> np.median(np.array([[[1]]]),axis=None).__class__
<type 'numpy.float64'>
.item() returns a python number not a numpy number
>>> np.array([[[1]]]).item().__class__
<type 'int'>
>>> np.array([[[1]]]).flat[0].__class__
<type 'numpy.int32'>
I also didn't know this:
>>> None < 0
True
Josef
-------------- next part --------------
# -*- coding: utf-8 -*-
"""
Created on Wed Jan 20 10:18:32 2010
Author: josef-pktd
"""
import numpy as np
from scipy import stats
def nanmedian(x, axis = 0):
keepshape = list(np.shape(x))
x, axis2 = stats.stats._chk_asarray(x, axis)
if (not axis is None) and axis<0 : # and x.ndim>2:
axis = x.ndim + axis
#print 'axis', axis
#print x, keepshape
if x.ndim == 0 or (x.size==1 and axis is None):
return 1.0*x.item()
if keepshape and not axis is None :
keepshape.pop(axis)
if x.size == 1: #x.ndim == 0:
return 1.0*x.reshape(keepshape)
if x.dtype == np.bool:
x = x.astype(int)
if axis is None:
axis = 0
x = np.sort(x, axis=axis)
nall = x.shape[axis]
notnancount = nall - np.isnan(x).sum(axis=axis)
#print 'notnancount', notnancount
(idx, rmd) = divmod(notnancount, 2)
#idx = np.atleast_1d(idx)
#print 'idx', idx
#print idx.shape
indx = np.ogrid[[slice(i)for i in x.shape]]
indxlo = indx[:]
idxslice = map(slice, idx.shape)
idxslice.insert(axis, None)
#print idxslice
idx = np.atleast_1d(idx)
indxlo[axis] = idx[idxslice]
#print indxlo
indxhi = indx[:]
indxhi[axis] = (idx - (1-rmd))[idxslice]#[idx.shape[:axis]+(None,)+idx.shape[axis:]]
#print map(np.shape, indxhi)
#print indxhi
nanmed = (x[indxlo] + x[indxhi])/2.
#print 'keepshape, nanmed.shape',keepshape, nanmed.shape
if np.ndim == 0:
return nanmed.item()
#return nanmed.reshape(keepshape)
return np.squeeze(nanmed) #.reshape(keepshape)
if nanmed.size == 1:
return nanmed.reshape(keepshape)
return nanmed.item()
return nanmed
from numpy.testing import assert_equal, assert_almost_equal
for axis in [0,1, None, -1]:
for i in range(5):
# for complex
#x = 1j+np.arange(20).reshape(4,5)
x = np.arange(20).reshape(4,5).astype(float)
x[zip(np.random.randint(4, size=(2,5)))] = np.nan
assert_equal(nanmedian(x, axis=0), stats.nanmedian(x, axis=0))
for axis in [0,1,2, None, -1]:
for i in range(5):
x = np.arange(3*4*5).reshape(3,4,5).astype(float)
x[np.random.randint(3, size=(3,4,5))] = np.nan
assert_equal(nanmedian(x, axis=0), stats.nanmedian(x, axis=0))
xli = [ [1], [[1]], [1,2], [1j], [1j,2], [True], [False],
[True,False], [True, False, True], np.round(np.random.randn(2,4,5),4)]
xxli = xli + map(np.array,xli)
for axis in [0, -1, None]:
print '\n next case',axis
for x in xxli:
try:
assert_equal(nanmedian(x, axis=axis), np.median(x, axis=axis))
assert_equal(np.shape(nanmedian(x, axis=axis)), np.shape(np.median(x, axis=axis)))
except:
print 'failure with', x
print nanmedian(x, axis=axis), np.median(x, axis=axis)
raise
y=np.round(np.random.randn(2,3,5),4)
axis = -1 #None
print np.median(y,axis=axis)
nm = nanmedian(y,axis=axis)
print nm
print np.median(y,axis=axis) - nm
#np.broadcast(array([[[0]],[[1]]]), array([[[1, 1, 1, 1, 1]], [[1, 1, 1, 1, 1]]]), array([[[0, 1, 2, 3, 4]]]))
More information about the SciPy-User
mailing list