[Numpy-discussion] Medians that ignore values

Peter Saffrey pzs at dcs.gla.ac.uk
Fri Sep 19 07:21:12 EDT 2008


David Cournapeau <david <at> ar.media.kyoto-u.ac.jp> writes:

> It may be that nanmedian is slow. But I would sincerly be surprised if
> it were slower than python list, except for some pathological cases, or
> maybe a bug in nanmedian. What do your data look like ? (size, number of
> nan, etc...)
> 

I've posted my test code below, which gives me the results:

$ ./arrayspeed3.py
list build time: 0.01
list median time: 0.01
array nanmedian time: 0.36

I must have done something wrong to hobble nanmedian in this way... I'm quite
new to numpy, so feel free to point out any obviously egregious errors.

Peter

===

from numpy import array, nan, inf
from pylab import rand
from time import clock
from scipy.stats.stats import nanmedian

import pdb
_pdb = pdb.Pdb()
breakpoint = _pdb.set_trace

def my_median(vallist):
	num_vals = len(vallist)
	vallist.sort()
	if num_vals % 2 == 1: # odd
		index = (num_vals - 1) / 2
		return vallist[index]
	else: # even
		index = num_vals / 2
		return (vallist[index] + vallist[index - 1]) / 2

numtests = 100
testsize = 100
pointlen = 3

t0 = clock()
natests = rand(numtests,testsize,pointlen)
# have to start with inf because list.remove(nan) doesn't remove nan
natests[natests > 0.9] = inf
tests = natests.tolist()
natests[natests==inf] = nan
for test in tests:
	for point in test:
		if inf in point:
			point.remove(inf)
t1 = clock()
print "list build time:", t1-t0


t0 = clock()
allmedians = []
for test in tests:
	medians = [ my_median(x) for x in test ]
	allmedians.append(medians)
t1 = clock()
print "list median time:", t1-t0

t0 = clock()
namedians = []
for natest in natests:
	thismed = nanmedian(natest, axis=1)
	namedians.append(thismed)
t1 = clock()
print "array nanmedian time:", t1-t0






More information about the NumPy-Discussion mailing list