[Numpy-discussion] masked ufuncs in C: on github

Eric Firing efiring at hawaii.edu
Sat May 16 04:02:21 EDT 2009


Charles R Harris wrote:
> 
> 
> On Fri, May 15, 2009 at 7:48 PM, Eric Firing <efiring at hawaii.edu 
> <mailto:efiring at hawaii.edu>> wrote:
> 
> 
>     http://www.mail-archive.com/numpy-discussion@scipy.org/msg17595.html
> 
>     Prompted by the thread above, I decided to see what it would take to
>     implement ufuncs with masking in C.  I described the result here:
> 
>     http://www.mail-archive.com/numpy-discussion@scipy.org/msg17698.html
> 
>     Now I am starting a new thread. The present state of the work is now in
>     github:  http://github.com/efiring/numpy-work/tree/cfastma
> 
>     I don't want to do any more until I have gotten some feedback from core
>     developers.  (And I would be delighted if someone wants to help with
>     this, or take it over.)

Chuck,

Thanks very much for the quick response.

> 
> 
> Here the if ... continue needs to follow the declaration:
> 
>         if (*mp1) continue;
>         float in1 = *(float *)ip1;
>         float in2 = *(float *)ip2;
>         *(float *)op1 = f(in1, in2);
>  

I was surprised to see the declarations inside the loop in the first 
place (this certainly is not ANSI-C), and I was also pleasantly 
surprised that letting them be after the conditional didn't seem to 
bother the compiler at all.  Maybe that is a gcc extension.

> I think this would be better as
> 
>         if (!(*mp1)) {
>             float in1 = *(float *)ip1;
>             float in2 = *(float *)ip2;
>             *(float *)op1 = f(in1, in2);
>         }
> 

I agree, and I thought of that originally--I think I did it with 
continue because it was easier to type it in, and it reduced the 
difference relative to the non-masked form.

> 
> But since this is actually a ternary function, you could define new 
> functions, something like
> 
> double npy_add_m(double a, double b, double mask)
> {
>     if (!mask) {
>         return a + b;
>     else {
>         return a;
>     }
> }
> 
> And use the currently existing loops. Well, you would have to add one 
> for ternary functions.
> 
That would incur the overhead of an extra function call for each 
element; I suspect it would slow it down a lot. My motivation is to make 
masked array overhead negligible, at least for medium to large arrays.

Also your suggestion above does not handle the case where an output 
argument is supplied; it would modify the output under the mask.

> Question, what about reduce? I don't think it is defined defined for 
> ternary functions. Apart from reduce, why not just add, you already have 
> the mask to tell you which results are invalid.
> 

You mean just do the operation and ignore the results under the mask? 
This is the way Pierre originally did it, if I remember correctly, but 
fairly recently people started objecting that they didn't want to 
disturb values in an output argument under a mask.  So now ma jumps 
through hoops to satisfy this requirement, and it is consequently slow.

ufunc methods like reduce are supported only for the binary ops with one 
output, so they are automatically unavailable for the masked versions. 
To get around this would require subclassing the ufunc to make a masked 
version.  This is probably the best way to go, but I suspect it is much 
more complicated than I can handle in the amount of time I can spend.

So maybe my proposed masked ufuncs are a slight abuse of the ufunc 
concept, or at least its present implementation.  Unary functions with a 
mask, which I have not yet tried to implement, would actually be binary, 
so they would have reduce etc. methods that would not make any sense. 
Is there a way to disable (remove) the methods in this case?

Eric

> Chuck
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion




More information about the NumPy-Discussion mailing list