# [Numpy-discussion] Need help with np.ma.median and np.apply_along_axis

Siegfried Gonzi siegfried.gonzi at ed.ac.uk
Fri Dec 20 03:26:22 EST 2013

```Please have a look at version1 and version2. What are my other options
here? Do I need to go the cython route here? Thanks, Siegfried

==
My array is as follows (shown here for dummy values; and yes this kind
of arrays do exist: 150 observations x 8 years x 366 days x 24 hours x
7 model levels):

data = np.random.random((150,8,366,24,7))

My function "np.apply_along_axis(my_moment,4,data)" takes 15 minutes.

I thought making use of masked arrays "my_moment_fast(data,axis=4)"
will speed up things but

1. It blows up the memory consumption to 6 GB at times

and

2. It also takes... I do not know as I killed it after 20 minutes (it
hangs at the median print statement).

The calculation of the median is the bottleneck here.

==
import numpy as np

def my_moment(data_in,nan_val=-999.0):

tmp = data_in[np.where(data_in<>nan_val)]

erg = np.array([np.min(tmp),np.mean(tmp),np.median(tmp),\
np.max(tmp),np.std(tmp),np.size(tmp)])

return erg

def my_moment_fast(data_in,nan_val=-999.0,axis=4):

import numpy as np

print 'min max',np.min(data_in),np.max(data_in)

print 'shape mx',np.shape(mx),np.min(mx),np.max(mx)

print 'min'
tmp_min = np.ma.min(mx,axis=axis)
print 'max'
tmp_max = np.ma.max(mx,axis=axis)
print 'mean'
tmp_mean = np.ma.mean(mx,axis=axis)
print 'median'
#tmp_median = np.ma.sort(mx,axis=axis)
tmp_median = np.ma.median(mx,axis=axis)
print 'std'
tmp_std = np.ma.std(mx,axis=axis)
print 'N'
tmp_N = np.ones(np.shape(mx))
tmp_N = np.ma.sum(tmp_N,axis=axis)

print 'shape min',np.shape(tmp_min),np.min(tmp_min),np.max(tmp_min)
print 'shape max',np.shape(tmp_max),np.min(tmp_max),np.max(tmp_min)
print 'shape mean',np.shape(tmp_mean),np.min(tmp_mean),np.max(tmp_mean)
print 'shape median', np.shape(tmp_median), np.min(tmp_median),
np.max(tmp_median)
print 'shape std',np.shape(tmp_std),np.min(tmp_std),np.max(tmp_std)
print 'shape N', np.shape(tmp_N), np.min(tmp_N), np.max(tmp_N),

return np.array([tmp_min,tmp_mean,tmp_median,tmp_max,tmp_std,tmp_N])

data = np.random.random((150,8,366,24,7))
data[134,5,300,:,2] = -999.0
data[14,3,300,:,0] = -999.0

version1 = my_moment_fast(data,axis=4)
exit()
version2 = np.apply_along_axis(my_moment,4,data)
==

What  am I doing wrong here? I haven't tested it againts Fortran and
have got no idea if sorting for fetching the median would be faster.

Thanks,
Siegfried
==

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

```