[Numpy-discussion] Are masked arrays slower for processing than ndarrays?

Sat May 9 17:22:18 EDT 2009

Hi,

I'm using masked arrays to compute large-scale standard deviation,
multiplication, gaussian, and weighted averages. At first I thought
using the masked arrays would be a great way to sidestep looping
(which it is), but it's still slower than expected. Here's a snippet
of the code that I'm using it for.

# Computing nearest neighbor distances.
# Output will be about 270,000 rows long for the index
# and 270,000x50 for the dist array.
tree = ann.kd_tree(np.column_stack([l,b]))
index, dist = tree.search(np.column_stack([l,b]),k=nth)

# Clipping bad values by replacing them acceptable values
av[np.where(av<=-10)] = -10
av[np.where(av>=50)] = 50

# Distance clipping and creating mask
dist_arcsec = np.sqrt(dist)*3600
mask = dist_arcsec <= d_thresh

# Creating masked array
av_good = ma.array(av[index],mask=mask)
dist_good = ma.array(dist_arcsec,mask=mask)

# Reason why I'm using masked arrays. If these were
# ndarrays with nan's, then the output would be nan.
Std = np.array(np.std(av_good,axis=1))
Var = Std*Std

Rho = np.zeros( (len(av), nth) )
Rho2  = np.zeros( (len(av), nth) )

dist_std = np.std(dist_good,axis=1)

for j in range(nth):
    Rho[:,j] = dist_std
    Rho2[:,j] = Var

# This part takes about 20 seconds to compute for a 270,000x50 masked array.
# Using ndarrays of the same size takes about 2 second
spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp( - dist_good /
(2*Rho**2))

# Like the spatial_weight section, this takes about 20 seconds
W = spatial_weight / Rho2

# Takes less than one second.
Ave = np.average(av_good,axis=1,weights=W)

Any ideas on why it would take such a long time for processing?
Especially the spatial_weight and W variables? Would there be a faster
way to do this? Or is there a way that numpy.std can process ignore
nan's when processing?

Thanks,

Eli Bressert