[Numpy-discussion] Are masked arrays slower for processing than ndarrays?

Sat May 9 20:37:40 EDT 2009

On May 9, 2009, at 8:17 PM, Eric Firing wrote:

> Eric Firing wrote:

>
> A part of the slowdown is what looks to me like unnecessary copying  
> in _MaskedBinaryOperation.__call__.  It is using getdata, which  
> applies numpy.array to its input, forcing a copy.  I think the copy  
> is actually unintentional, in at least one sense, and possibly two:  
> first, because the default argument of getattr is always evaluated,  
> even if it is not needed; and second, because the call to np.array  
> is used where np.asarray or equivalent would suffice.

Yep, good call. the try/except should be better, and yes, I forgot to  
force copy=False (thought it was on by default...). I didn't know that  
getattr always evaluated the default, the docs are scarce on that  
subject...

> Pierre,
>
> ... I pressed "send" too soon.  There are test failures with the  
> patch I attached to my last message.  I think the basic ideas are  
> correct, but evidently there are wrinkles to be worked out.  Maybe  
> putmask() has to be used instead of where() (putmask is much faster)  
> to maintain the ability to do *= and similar, and maybe there are  
> other adjustments. Somehow, though, it should be possible to get  
> decent speed for simple multiplication and division; a 10x penalty  
> relative to ndarray operations is just too much.

Quite agreed. It was a shock to realize that we were that slow. I  
gonna have to start testing w/ large arrays...

I'm confident we can significantly speed up the _MaskedOperations  
without losing any of the features. Yes, putmask may be a better  
option. We could probably use the following MO:
* result = a.data/b.data
* putmask(result, m, a)

However, I gonna need a good couple of weeks before being able to  
really look into it...