[Numpy-discussion] ufunc improvements [Was: Warnings in numpy.ma.test()]

Sun Mar 28 09:23:06 EDT 2010

I'd like to use this thread to discuss possible improvements to
generalize numpys functions. Sorry for double posting, but we will
have a hard time keeping track of discussion about how to improve
functions to deal with subclasses if they are spread across threads
talking about warnings in masked arrays or masked arrays not dealing
well with trapz. There is an additional bit at the end that was not
discussed elsewhere.

On Thu, Mar 18, 2010 at 8:14 AM, Darren Dale <dsdale24 at gmail.com> wrote:
> On Wed, Mar 17, 2010 at 10:16 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>> Just *one* function to rule them all and on the subtype dump it. No
>> __array_wrap__, __input_prepare__, or __array_prepare__, just something like
>> __handle_ufunc__. So it is similar but perhaps more radical. I'm proposing
>> having the ufunc upper layer do nothing but decide which argument type will
>> do all the rest of the work, casting, calling the low level ufunc base,
>> providing buffers, wrapping, etc. Instead of pasting bits and pieces into
>> the existing framework I would like to lay out a line of attack that ends up
>> separating ufuncs into smaller pieces that provide low level routines that
>> work on strided memory while leaving policy implementation to the subtype.
>> There would need to be some default type (ndarray) when the functions are
>> called on nested lists and scalars and I'm not sure of the best way to
>> handle that.
>>
>> I'm just sort of thinking out loud, don't take it too seriously.
>
> Thanks for the clarification. I think I see how this could work: if
> ufuncs were callable instances of classes, __call__ would find the
> input with highest priority and pass itself and the input to that
> object's __handle_ufunc__. Now it is up to __handle_ufunc__ to
> determine whether and how to modify the input, call some method on the
> ufunc (like execute)
> to perform the buffer operation, then __handle_ufunc__ performs the
> cast, deals with metadata and returns the result.
>
> I skipped a step: initializing the output buffer. Would that be rolled
> into the ufunc execution, or should it be possible for
> __handle_ufunc__ to access the initialized buffer before execution
> occurs(__array_prepare__)? I think it is important to be able to
> perform the cast and calculate metadata before ufunc execution. If an
> error occurs, an exception can be raised before the ufunc operates on
> the arrays, which can modifies the data in place.

We discussed the possibility of simplifying the wrapping scheme with a
method like __handle_gfunc__. (I don't think this necessarily has to
be limited to ufuncs.) I think a second method like __prepare_input__
is also necessary. Imagine something like:

class GenericFunction:
   @property
   def executable(self):
       return self._executable
   def __init__(self, executable):
       self._executable = executable
   def __call__(self, *args, **kwargs):
       # find the input with highest priority, and then:
       args, kwargs = input.__prepare_input__(self, *args, **kwargs)
       return input.__handle_gfunc__(self, *args, **kwargs)

# this is the core function to be passed to the generic class:
def _add(a, b, out=None):
   # the generic, ndarray implementation.
   ...

# here is the publicly exposed interface:
add = GenericFunction(_add)

# now my subclasses
class MyArray(ndarray):
   # My class tweaks the execution of the function in __handle_gfunc__
   def __prepare_input__(self, gfunc, *args, **kwargs):
       return mod_input[gfunc](*args, **kwargs)
   def __handle_gfunc__(self, gfunc, *args, **kwargs):
       res = gfunc.executable(*args, **kwargs)
       # you could have called a different core func there
       return mod_output[gfunc](res, *args, **kwargs)

class MyNextArray(MyArray):
   def __prepare_input__(self, gfunc, *args, **kwargs):
       # let the superclass do its thing:
       args, kwargs = MyArray.__prepare_input__(self, gfunc, *args, **kwargs)
       # now I can tweak it further:
       return mod_input_further[gfunc](*args, **kwargs)
   def __handle_gfunc__(self, gfunc, *args, **kwargs):
       # let's defer to the superclass to handle calling the core function:
       res = MyArray.__handle_gfunc__(self, gfunc, *args, **kwargs)
       # and now we have one more crack at the result before passing it back:
       return mod_output_further[gfunc](res, *args, **kwargs)

If a gfunc is not recognized, the subclass might raise a
NotImplementedError or it might just pass the original args, kwargs on
through. I didn't write that part out because the example was already
running long. But the point is that a single entry point could be used
for any subclass, without having to worry about how to support every
subclass. It may still be necessary to be mindful to use asanyarray in
the core functions, but if a subclass alters the behavior of some
operation such that an operation needs to happen on an ndarray view of
the data, __prepare_input__ provides an opportinuty to prepare such
views. For example, in our current situation, matrices would not be
compatible with trapz if trapz did not cast the input to ndarrays, but
as a result trapz is not compatible with masked arrays or quantities.
With the proposed scheme, matrices would in some cases pass ndarray
views to the core function, but in other cases pass the arguments
through unmodified, since the function might build on other functions
that are already generalized to support those types of data.

Darren