[Numpy-discussion] Another suggestion for making numpy's functions generic

Sebastian Walter sebastian.walter at gmail.com
Tue Oct 20 05:24:51 EDT 2009


I'm not very familiar with the underlying C-API of numpy, so this has
to be taken with a grain of salt.

The reason why I'm curious about the genericity is that it would be
awesome to have:
1) ufuncs like sin, cos, exp... to work on arrays of any object (this
works already)
2) funcs like dot, eig, etc, to work on arrays of objects( works for
dot already, but not for eig)
3) ufuncs and funcs to work on any objects

examples that would be nice to work are among others:
* arrays of polynomials, i.e. arrays of objects
* polynomials with tensor coefficients, object with underlying array structure

I thought that the most elegant way to implement that would be to have
all numpy functions try  to call either
1)  the class function with the same name as the numpy function
2) or if the class function is not implemented, the member function
with the same name as the numpy function
3) if none exists, raise an exception

E.g.

1)
if isinstance(x) = Foo
then numpy.sin(x)
would call Foo.sin(x) if it doesn't know how to handle Foo

2)
similarly, for arrays of objects of type Foo:
 x = np.array([Foo(1), Foo(2)])

Then numpy.sin(x)
should try to return npy.array([Foo.sin(xi) for xi in x])
or in case Foo.sin is not implemented as class function,
return : np.array([xi.sin() for xi in x])

Therefore, I somehow expected something like that:
Quantity would derive from numpy.ndarray.
When calling  Quantity.__new__(cls) creates the member functions
__add__, __imul__, sin, exp, ...
where each function has a preprocessing part and a post processing part.
After the preprocessing call the original ufuncs on the base class
object, e.g. __add__



Sebastian



On Mon, Oct 19, 2009 at 1:55 PM, Darren Dale <dsdale24 at gmail.com> wrote:
> On Mon, Oct 19, 2009 at 3:10 AM, Sebastian Walter
> <sebastian.walter at gmail.com> wrote:
>> On Sat, Oct 17, 2009 at 2:49 PM, Darren Dale <dsdale24 at gmail.com> wrote:
>>> numpy's functions, especially ufuncs, have had some ability to support
>>> subclasses through the ndarray.__array_wrap__ method, which provides
>>> masked arrays or quantities (for example) with an opportunity to set
>>> the class and metadata of the output array at the end of an operation.
>>> An example is
>>>
>>> q1 = Quantity(1, 'meter')
>>> q2 = Quantity(2, 'meters')
>>> numpy.add(q1, q2) # yields Quantity(3, 'meters')
>>>
>>> At SciPy2009 we committed a change to the numpy trunk that provides a
>>> chance to determine the class and some metadata of the output *before*
>>> the ufunc performs its calculation, but after output array has been
>>> established (and its data is still uninitialized). Consider:
>>>
>>> q1 = Quantity(1, 'meter')
>>> q2 = Quantity(2, 'J')
>>> numpy.add(q1, q2, q1)
>>> # or equivalently:
>>> # q1 += q2
>>>
>>> With only __array_wrap__, the attempt to propagate the units happens
>>> after q1's data was updated in place, too late to raise an error, the
>>> data is now corrupted. __array_prepare__ solves that problem, an
>>> exception can be raised in time.
>>>
>>> Now I'd like to suggest one more improvement to numpy to make its
>>> functions more generic. Consider one more example:
>>>
>>> q1 = Quantity(1, 'meter')
>>> q2 = Quantity(2, 'feet')
>>> numpy.add(q1, q2)
>>>
>>> In this case, I'd like an opportunity to operate on the input arrays
>>> on the way in to the ufunc, to rescale the second input to meters. I
>>> think it would be a hack to try to stuff this capability into
>>> __array_prepare__. One form of this particular example is already
>>> supported in quantities, "q1 + q2", by overriding the __add__ method
>>> to rescale the second input, but there are ufuncs that do not have an
>>> associated special method. So I'd like to look into adding another
>>> check for a special method, perhaps called __input_prepare__. My time
>>> is really tight for the next month, so I'd rather not start if there
>>> are strong objections, but otherwise, I'd like to try to try to get it
>>> in in time for numpy-1.4. (Has a timeline been established?)
>>>
>>> I think it will be not too difficult to document this overall scheme:
>>>
>>> When calling numpy functions:
>>>
>>> 1) __input_prepare__ provides an opportunity to operate on the inputs
>>> to yield versions that are compatible with the operation (they should
>>> obviously not be modified in place)
>>>
>>> 2) the output array is established
>>>
>>> 3) __array_prepare__ is used to determine the class of the output
>>> array, as well as any metadata that needs to be established before the
>>> operation proceeds
>>>
>>> 4) the ufunc performs its operations
>>>
>>> 5) __array_wrap__ provides an opportunity to update the output array
>>> based on the results of the computation
>>>
>>> Comments, criticisms? If PEP 3124^ were already a part of the standard
>>> library, that could serve as the basis for generalizing numpy's
>>> functions. But I think the PEP will not be approved in its current
>>> form, and it is unclear when and if the author will revisit the
>>> proposal. The scheme I'm imagining might be sufficient for our
>>> purposes.
>>
>> I'm all for generic (u)funcs since they might come handy for me since
>> I'm doing lots of operation on arrays of polynomials.
>>  I don't quite get the reasoning though.
>> Could you correct me where I get it wrong?
>> * the class Quantity derives from numpy.ndarray
>> * Quantity overrides __add__, __mul__ etc. and you get the correct behaviour for
>> q1 = Quantity(1, 'meter')
>> q2 = Quantity(2, 'J')
>> by raising an exception when performing q1+=q2
>
> No, Quantity does not override __iadd__ to catch this. Quantity
> implements __array_prepare__ to perform the dimensional analysis based
> on the identity of the ufunc and the inputs, and set the class and
> dimensionality of the output array, or raise an error when dimensional
> analysis fails. This approach lets quantities support all ufuncs (in
> principle), not just built in numerical operations. It should also
> make it easier to subclass from MaskedArray, so we could have a
> MaskedQuantity without having to establish yet another suite of ufuncs
> specific to quantities or masked quantities.
>
>> * The problem is that numpy.add(q1,q1,q2) would corrupt q1 before
>> raising an exception
>
> That was solved by the addition of __array_prepare__ to numpy back in
> August. What I am proposing now is supporting operations on arrays
> that would be compatible if we had a chance to transform them on the
> way into the ufunc, like "meter + foot".
>
> Darren
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>



More information about the NumPy-Discussion mailing list