[Numpy-discussion] Are masked arrays slower for processing than ndarrays?

Fri May 15 14:05:18 EDT 2009

Pierre GM wrote:
> On May 13, 2009, at 7:36 PM, Matt Knox wrote:
>>> Here's the catch: it's basically cheating. I got rid of the pre-
>>> processing (where a mask was calculated depending on the domain and
>>> the input set to a filling value depending on this mask, before the
>>> actual computation). Instead, I  force
>>> np.seterr(divide='ignore',invalid='ignore') before calling the ufunc
>> This isn't a thread safe approach and could cause wierd side effects  
>> in a
>> multi-threaded application. I think modifying global options/ 
>> variables inside
>> any function where it generally wouldn't be expected by the user is  
>> a bad idea.
> 
> Whine. I was afraid of something like that...
> 2 options, then:
> * We revert to computing a mask beforehand. That looks like the part  
> that takes the most time w/ domained operations (according to Robert  
> K's profiler. Robert, you deserve a statue for this tool). And that  
> doesn't solve the pb of power, anyway: how do you compute the domain  
> of power ?
> * We reimplement masked versions of the ufuncs in C. Won't happen from  
> me anytime soon (this fall or winter, maybe...)

Pierre,

I have implemented masked versions of all binary ufuncs in C, using 
slight modifications of the numpy code generation machinery.  I suspect 
that the way I have done it will not be the final method, and as of this 
moment I have just gotten it compiled and minimally checked (numpy 
imports, multiply_m(x, y, mask, out) puts x*y in out only where mask is 
False), but it is enough to make me think that we should be able to make 
it work in numpy.ma.

In the present implementation, the masked versions of the ufuncs take a 
single mask, and they live in the same namespace as the unmasked 
versions.  Masked versions of the unary ufuncs need to be added.  Binary 
versions taking two masks and returning the resulting mask can also be 
added, but with considerably more effort, so I view that as something to 
be done only after all the wrinkles are worked out with the single-mask 
implementation.

I view these masked versions of ufuncs as perfectly good standalone 
entities, which will enable a huge speedup in numpy.ma, but which may 
also be useful independently of masked arrays.

I have made no attempt at this point to address domain checking, but 
certainly this needs to be moved into the C stage also, with separate 
ufuncs while we have only the single-mask binary ufuncs, but directly 
into the double-mask binary ufuncs whenever those are implemented.

Example:

In [1]:import numpy as np

In [2]:x = np.arange(3)

In [3]:y = np.arange(3) + 2

In [4]:x
Out[4]:array([0, 1, 2])

In [5]:y
Out[5]:array([2, 3, 4])

In [6]:mask = np.array([False, True, False])

In [7]:np.multiply_m(x, y, mask, x)
Out[7]:array([0, 1, 8])

In [8]:x = np.arange(1000000, dtype=float)

In [9]:y = np.sin(x)

In [10]:mask = y > 0

In [11]:z = np.zeros_like(x)

In [12]:timeit np.multiply(x,y,z)
100 loops, best of 3: 10.5 ms per loop

In [13]:timeit np.multiply_m(x,y,mask,z)
100 loops, best of 3: 12 ms per loop

Eric