On Wed, May 27, 2009 at 11:30 AM, Darren Dale <dsdale24@gmail.com> wrote:
Now that numpy-1.3 has been released, I was hoping I could engage the numpy developers and community concerning my suggestion to improve the ufunc wrapping mechanism. Currently, ufuncs call, on the way out, the __array_wrap__ method of the input array with the highest __array_priority__.
There are use cases, like masked arrays or arrays with units, where it is imperative to run some code on the way in to the ufunc as well. MaskedArrays do this by reimplementing or wrapping ufuncs, but this approach puts some pretty severe constraints on subclassing. For example, in my Quantities package I have a Quantity object that derives from ndarray. It has been suggested that in order to make ufuncs work with Quantity, I should wrap numpy's built-in ufuncs. But I intend to make a MaskedQuantity object as well, deriving from MaskedArray, and would therefore have to wrap the MaskedArray ufuncs as well.
If ufuncs would simply call a method both on the way in and on the way out, I think this would go a long way to improving this situation. I whipped up a simple proof of concept and posted it in this thread a while back. For example, a MaskedQuantity would implement a method like __gfunc_pre__ to check the validity of the units operation etc, and would then call MaskedArray.__gfunc_pre__ (if defined) to determine the domain etc. __gfunc_pre__ would return a dict containing any metadata the subclasses wish to provide based on the inputs, and that dict would be passed along with the inputs, output and context to __gfunc_post__, so postprocessing can be done (__gfunc_post__ replacing __array_wrap__).
Of course, packages like MaskedArray may still wish to reimplement ufuncs, like Eric Firing is investigating right now. The point is that classes that dont care about the implementation of ufuncs, that only need to provide metadata based on the inputs and the output, can do so using this mechanism and can build upon other specialized arrays.
I would really appreciate input from numpy developers and other interested parties. I would like to continue developing the Quantities package this summer, and have been approached by numerous people interested in using Quantities with sage, sympy, matplotlib. But I would prefer to improve the ufunc mechanism (or establish that there is no interest among the community to do so) so I can improve the package (or limit its scope) before making an official announcement.
There was some discussion of this proposal to allow better interaction of ufuncs with ndarray subclasses in another thread (Plans for numpy-1.4.0 and scipy-0.8.0) and the comments were encouraging. I have been trying to gather feedback as to whether the numpy devs were
receptive to the idea, and it seems the answer is tentatively yes, although there were questions about who would actually write the code. I guess I have not made clear that I intend to write the implementation and tests. I gained some familiarity with the relevant code while squashing a few bugs for numpy-1.3, but it would be helpful if someone else who is familiar with the existing __array_wrap__ machinery would be willing to discuss this proposal in more detail and offer constructive criticism along the way. Is anyone willing?
What is the timeframe being considered for the numpy-1.4 release?
Thanks,
Darren