Now that numpy-1.3 has been released, I was hoping I could engage the numpy developers and community concerning my suggestion to improve the ufunc wrapping mechanism. Currently, ufuncs call, on the way out, the __array_wrap__ method of the input array with the highest __array_priority__.
There are use cases, like masked arrays or arrays with units, where it is imperative to run some code on the way in to the ufunc as well. MaskedArrays do this by reimplementing or wrapping ufuncs, but this approach puts some pretty severe constraints on subclassing. For example, in my Quantities package I have a Quantity object that derives from ndarray. It has been suggested that in order to make ufuncs work with Quantity, I should wrap numpy's built-in ufuncs. But I intend to make a MaskedQuantity object as well, deriving from MaskedArray, and would therefore have to wrap the MaskedArray ufuncs as well.
If ufuncs would simply call a method both on the way in and on the way out, I think this would go a long way to improving this situation. I whipped up a simple proof of concept and posted it in this thread a while back. For example, a MaskedQuantity would implement a method like __gfunc_pre__ to check the validity of the units operation etc, and would then call MaskedArray.__gfunc_pre__ (if defined) to determine the domain etc. __gfunc_pre__ would return a dict containing any metadata the subclasses wish to provide based on the inputs, and that dict would be passed along with the inputs, output and context to __gfunc_post__, so postprocessing can be done (__gfunc_post__ replacing __array_wrap__).
Of course, packages like MaskedArray may still wish to reimplement ufuncs, like Eric Firing is investigating right now. The point is that classes that dont care about the implementation of ufuncs, that only need to provide metadata based on the inputs and the output, can do so using this mechanism and can build upon other specialized arrays.
I would really appreciate input from numpy developers and other interested parties. I would like to continue developing the Quantities package this summer, and have been approached by numerous people interested in using Quantities with sage, sympy, matplotlib. But I would prefer to improve the ufunc mechanism (or establish that there is no interest among the community to do so) so I can improve the package (or limit its scope) before making an official announcement.
Thank you,
Darren
On Mon, Mar 9, 2009 at 9:50 AM, Darren Dale <dsdale24@gmail.com> wrote:I spent some time over the weekend fixing a few bugs in numpy that were exposed when attempting to use ufuncs with ndarray subclasses. It got me thinking that, with relatively little work, numpy's functions could be made to be more general. For example, the numpy.ma module redefines many of the standard ufuncs in order to do some preprocessing before the builtin ufunc is called. Likewise, in the units/quantities package I have been working on, I would like to perform a dimensional analysis to make sure an operation is allowed before I call a ufunc that might change data in place.
Imagine an ndarray subclass with methods like __gfunc_pre__ and __gfunc_post__. __gfunc_pre__ could accept the context that is currently provided to __array_wrap__ (the inputs and the function called), perform whatever preprocessing is desired, and maybe return a dictionary containing metadata. Numpy functions could then be wrapped with a decorator that 1) calls __gfunc_pre__ and obtain any metadata that is returned 2) calls the wrapped functions, and then 3) calls __gfunc_post__, which might be very similar to __array_wrap__ except that it would also accept the metadata created by __gfunc_pre__.
In cases where the routines to be called by __gfunc_pre__ and _post__ depend on what function is called, the the subclass could implement routines and store them in a dictionary-like object that is keyed using the function called. I have been exploring this approach with Quantities and it seems to work well. For example:
def __gfunc_pre__(self, gfunc, *args):
try:
return gfunc_pre_registry[gfunc](*args)
except KeyError:
return {}
I think such an approach for generalizing numpy's functions could be implemented without being disruptive to the existing __array_wrap__ framework. The decorator would attempt to identify an input or output array to use to call __gfunc_pre__ and _post__. If it finds them, it uses them. If it doesnt find them, no harm done, the existing __array_wrap__ mechanisms are still in place if the wrapped function is a ufunc.
One other nice feature: the metadata that is returned by __gfunc_pre__ could contain an optional flag that the decorator attempts to pass to the wrapped function so that __gfunc_pre__ and _post are not called for any decorated internal functions. That way the subclass could specify that __gfunc_pre__ and _post should be called only for the outer-most function.
Comments?
I'm attaching a proof of concept script, maybe it will better illustrate what I am talking about.