[Numpy-discussion] masked ufuncs in C: on github

Eric Firing efiring at hawaii.edu
Fri May 15 21:48:50 EDT 2009


http://www.mail-archive.com/numpy-discussion@scipy.org/msg17595.html

Prompted by the thread above, I decided to see what it would take to 
implement ufuncs with masking in C.  I described the result here:

http://www.mail-archive.com/numpy-discussion@scipy.org/msg17698.html

Now I am starting a new thread. The present state of the work is now in 
github:  http://github.com/efiring/numpy-work/tree/cfastma

I don't want to do any more until I have gotten some feedback from core 
developers.  (And I would be delighted if someone wants to help with 
this, or take it over.)

1) The strategy I have started with is to make a full set of masked 
ufuncs alongside the existing ones, appending "_m" to their names.  Only 
the binary ufuncs are implemented now, but the unary ufuncs can be 
handled similarly.  Example:

multiply(x, y, out)            # present ufunc: no change
multiply_m(x, y, mask, out)    # new

Where mask is True, the operation is skipped.

2) I have in mind the possibility of supporting two input masks and one 
output mask for binary operations.  This would look like:

multiply_mm(x, y, maskx, masky, out, outmask)

outmask would be the logical_or of maskx and masky, and in the case of 
domained operations it would also be True where the arguments are 
outside the domain.

This form would provide the fastest support for masked arrays, but would 
also take quite a bit more work, and would expand the namespace even 
more.  I'm not sure it's worth it.

3) I have not yet taken any steps to modify numpy.ma to take advantage 
of the new ufuncs, but I think that will be quite simple.

4) Likewise, to save time, I am now just borrowing the regular ufunc 
docstrings.

5) No tests yet, Stefan.  They can be added as soon as there is 
agreement on API and general strategy.

6) The present implementation is based on conceptually small 
modifications of the existing numpy code generation system.  It required 
a lot of cut and paste, and yields a lot of nearly duplicated code. 
There may be better ways to do it--especially if it turns out it needs 
to be redone in some modified form.

Eric



More information about the NumPy-Discussion mailing list