Re: [Numpy-discussion] Generalized ufuncs?

I am sorry that our submission http://projects.scipy.org/scipy/numpy/ticket/887 has created some annoyance; presumably we have taken the "Make contributions (e.g. code patches), (...) by submitting a 'ticket' on the Trac pages linked below" on http://scipy.org/Developer_Zone somewhat too literally. It was great to receive very positive responses to our patch and to receive a very timely review; this is encouraging for submitting code to the numpy repository. I would like to add that the proposed change is not that arbitrary; it is a well-established concept -- it is outlined on the GeneralLoopingFunctions wiki page, and it is a prominent concept of perl's PDL vector library. Of course, there is still room for arguing about details. The fact that no explicit "generalized" ufuncs are provided, should in my opinion not be an argument why not to include the change in the 1.2.0 release. Writing extension libraries that implement such generalized ufuncs, while being able to use the standard numpy distribution, would certainly be very valuable. Furthermore, the risk for including the proposed patch in 1.2.0 is very low: existing functionality is not touched. (Except the glitch we had by declaring variables in a gcc-way.) For standard ufuncs, it should be straightforward to see that the patch does not modify the behavior at all. I wish everyone a great SciPy'08 conference and very much hope that you can include the proposed functionality in the forthcoming NumPy release. Best regards, Hansres

On Sun, Aug 17, 2008 at 19:13, Engel, Hans-Andreas <Hans-Andreas.Engel@deshaw.com> wrote:
I am sorry that our submission http://projects.scipy.org/scipy/numpy/ticket/887 has created some annoyance; presumably we have taken the "Make contributions (e.g. code patches), (...) by submitting a 'ticket' on the Trac pages linked below" on http://scipy.org/Developer_Zone somewhat too literally.
Not at all. You did everything right. Unfortunately, it comes at a slightly awkward time. We are just about to make a release, and this is a substantial feature that some of us developers think needs a little more consideration than we can give it if we were to put it into this release. It's only the timing that triggered the conflict, not anything you really had control over.
It was great to receive very positive responses to our patch and to receive a very timely review; this is encouraging for submitting code to the numpy repository.
I would like to add that the proposed change is not that arbitrary; it is a well-established concept -- it is outlined on the GeneralLoopingFunctions wiki page, and it is a prominent concept of perl's PDL vector library. Of course, there is still room for arguing about details.
The fact that no explicit "generalized" ufuncs are provided, should in my opinion not be an argument why not to include the change in the 1.2.0 release. Writing extension libraries that implement such generalized ufuncs, while being able to use the standard numpy distribution, would certainly be very valuable.
Furthermore, the risk for including the proposed patch in 1.2.0 is very low: existing functionality is not touched. (Except the glitch we had by declaring variables in a gcc-way.) For standard ufuncs, it should be straightforward to see that the patch does not modify the behavior at all.
My concern is not that the patch might be buggy or anything like that. Rather, this is a substantial feature, and one that affects the C API. It will be very difficult to remove it or modify it later if we find that it needs even a slight tweak to its design. Despite its use in other array languages, it's new to us, and I think we need to explore its use cases a little. numpy isn't PDL, and the way this feature integrates with the rest of numpy's semantics and idioms is important to figure out. It's possible that there are problems that it could solve with just a slight modification to the design. I suggested that we move it to a branch for the time being so we can play with it and come up with examples of its use. If you have examples that you have already written, I would love to see them. I, for one, am amenable to seeing this in 1.2.0, but only if we push back the release by at least a few weeks. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

2008/8/17 Robert Kern <robert.kern@gmail.com>:
I suggested that we move it to a branch for the time being so we can play with it and come up with examples of its use.
That branch is here: http://stefan@svn.scipy.org/svn/numpy/branches/gen_ufuncs Stéfan

On Sun, Aug 17, 2008 at 7:56 PM, Stéfan van der Walt <stefan@sun.ac.za>wrote:
2008/8/17 Robert Kern <robert.kern@gmail.com>:
I suggested that we move it to a branch for the time being so we can play with it and come up with examples of its use.
That branch is here:
For an earlier thread about using vector valued ufuncs for sorts and such -- and negative reactions to the suggestion -- go here<http://thread.gmane.org/gmane.comp.python.numeric.general/20552/focus=20560>. One of the major objections was how to call such functions with the ufunc machinery and the needed machinery for type promotions, sub classes, and all that nonsense. Are these dealt with in the patch? The current numpy code for all that is a bit of a mess anyway, and it would be nice to figure out some unified interface to call through and clean up the current code in the process. In fact, I've been making some preliminary efforts in that direction by formatting the current code and working through it. Also, do we also want reduce methods and such? I suspect there is still a lot of work to do to get this whole thing up and running. Chuck

2008/8/17 Robert Kern <robert.kern@gmail.com>:
I suggested that we move it to a branch for the time being so we can play with it and come up with examples of its use. If you have examples that you have already written, I would love to see them. I, for one, am amenable to seeing this in 1.2.0, but only if we push back the release by at least a few weeks.
This is not a worked example, but this is exactly what is needed to make possible the "arrays of matrices" functions that were discussed some time ago. For example, users frequently want to do something like multiply an array of matrices by an array of matrices; that is not currently feasible without a very large temporary array (or a loop). But I think you were looking for examples of code using the interface, to see whether it worked well. Anne

On Sun, Aug 17, 2008 at 21:55, Anne Archibald <peridot.faceted@gmail.com> wrote:
2008/8/17 Robert Kern <robert.kern@gmail.com>:
I suggested that we move it to a branch for the time being so we can play with it and come up with examples of its use. If you have examples that you have already written, I would love to see them. I, for one, am amenable to seeing this in 1.2.0, but only if we push back the release by at least a few weeks.
This is not a worked example, but this is exactly what is needed to make possible the "arrays of matrices" functions that were discussed some time ago. For example, users frequently want to do something like multiply an array of matrices by an array of matrices; that is not currently feasible without a very large temporary array (or a loop).
But I think you were looking for examples of code using the interface, to see whether it worked well.
I'll take what I can get. In order of increasing utility: 1. Descriptions of use cases (as above). 2. Mockups of the Python code demonstrating the use case (e.g. nonfunctional Python code showing a potential generalized ufunc with inputs and outputs). 3. The C code implementing the actual functionality for the use case. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On Sun, Aug 17, 2008 at 9:15 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Sun, Aug 17, 2008 at 21:55, Anne Archibald <peridot.faceted@gmail.com> wrote:
2008/8/17 Robert Kern <robert.kern@gmail.com>:
I suggested that we move it to a branch for the time being so we can play with it and come up with examples of its use. If you have examples that you have already written, I would love to see them. I, for one, am amenable to seeing this in 1.2.0, but only if we push back the release by at least a few weeks.
This is not a worked example, but this is exactly what is needed to make possible the "arrays of matrices" functions that were discussed some time ago. For example, users frequently want to do something like multiply an array of matrices by an array of matrices; that is not currently feasible without a very large temporary array (or a loop).
But I think you were looking for examples of code using the interface, to see whether it worked well.
I'll take what I can get. In order of increasing utility:
1. Descriptions of use cases (as above).
It is also possible to do sums, cumulative sums, and other such things. A generalization of the proposed ufuncs useful for these sorts of things would be mixed type arguments, which would also help in implementing such things as argsort and casting. This requires a different kind of infrastructure for defining and looking up the ufuncs, but I don't think of this as a complication, but rather a simplification as it would allow us to make good use of the code generator and introduce a certain uniformity to the implementation of numpy. Uniformity is good. Chuck

On Sun, Aug 17, 2008 at 6:13 PM, Engel, Hans-Andreas < Hans-Andreas.Engel@deshaw.com> wrote:
I am sorry that our submission http://projects.scipy.org/scipy/numpy/ticket/887 has created some annoyance; presumably we have taken the "Make contributions (e.g. code patches), (...) by submitting a 'ticket' on the Trac pages linked below" on http://scipy.org/Developer_Zone somewhat too literally.
It was great to receive very positive responses to our patch and to receive a very timely review; this is encouraging for submitting code to the numpy repository.
I would like to add that the proposed change is not that arbitrary; it is a well-established concept -- it is outlined on the GeneralLoopingFunctions wiki page, and it is a prominent concept of perl's PDL vector library. Of course, there is still room for arguing about details.
The fact that no explicit "generalized" ufuncs are provided, should in my opinion not be an argument why not to include the change in the 1.2.0 release. Writing extension libraries that implement such generalized ufuncs, while being able to use the standard numpy distribution, would certainly be very valuable.
Furthermore, the risk for including the proposed patch in 1.2.0 is very low: existing functionality is not touched. (Except the glitch we had by declaring variables in a gcc-way.) For standard ufuncs, it should be straightforward to see that the patch does not modify the behavior at all.
I wish everyone a great SciPy'08 conference and very much hope that you can include the proposed functionality in the forthcoming NumPy release.
For those interested in the description posted with ticket #887<http://scipy.org/scipy/numpy/ticket/887>, here it is. The patch can be found with the ticket. I note that PyUFunc_FromFuncAndDataAndSignature is not put at the end of the ufunc_api_order.txt list and might break the current API, which has entries like #define PyUFunc_Type (*(PyTypeObject *)PyUFunc_API[0]) So we need to keep the PyUFunc_API entries in order. Chuck ============================================================================================ Generalized Universal Functions ¶<http://scipy.org/scipy/numpy/ticket/887#GeneralizedUniversalFunctions> There is a general need for looping over not only functions on scalars but also over functions on vectors (or arrays), as explained on http://scipy.org/scipy/numpy/wiki/GeneralLoopingFunctions. We propose to realize this concept by generalizing the universal functions (ufuncs), and provide a C implementation that adds ~500 lines to the numpy code base. In current (specialized) ufuncs, the elementary function is limited to element-by-element operations, whereas the generalized version supports "sub-array" by "sub-array" operations. The Perl vector library PDL provides a similar functionality and its terms are re-used in the following. Each generalized ufunc has information associated with it that states what the "core" dimensionality of the inputs is, as well as the corresponding dimensionality of the outputs (the element-wise ufuncs have zero core dimensions). The list of the core dimensions for all arguments is called the "signature" of a ufunc. For example, the ufunc numpy.add has signature "(),()->()" defining two scalar inputs and one scalar output. Another example is (see the GeneralLoopingFunctions<http://scipy.org/scipy/numpy/wiki/GeneralLoopingFunctions>page) the function inner1d(a,b) with a signature of "(i),(i)->()". This applies the inner product along the last axis of each input, but keeps the remaining indices intact. For example, where a is of shape (3,5,N) and b is of shape (5,N), this will return an output of shape (3,5). The underlying elementary function is called 3*5 times. In the signature, we specify one core dimension "(i)" for each input and zero core dimensions "()" for the output, since it takes two 1-d arrays and returns a scalar. By using the same name "i", we specify that the two corresponding dimensions should be of the same size (or one of them is of size 1 and will be broadcasted). The dimensions beyond the core dimensions are called "loop" dimensions. In the above example, this corresponds to (3,5). The usual numpy "broadcasting" rules apply, where the signature determines how the dimensions of each input/output object are split into core and loop dimensions: 1. While an input array has a smaller dimensionality than the corresponding number of core dimensions, 1's are pre-pended to its shape. 2. The core dimensions are removed from all inputs and the remaining dimensions are broadcasted; defining the loop dimensions. 3. The output is given by the loop dimensions plus the output core dimensions. Definitions ¶ <http://scipy.org/scipy/numpy/ticket/887#Definitions> *Elementary Function:* Each ufunc consists of an elementary function that performs the most basic operation on the smallest portion of array arguments (e.g. adding two numbers is the most basic operation in adding two arrays). The ufunc applies the elementary function multiple times on different parts of the arrays. The input/output of elementary functions can be vectors; e.g., the elementary function of inner1d takes two vectors as input. *Signature:* A signature is a string describing the input/output dimensions of the elementary function of a ufunc. See section below for more details. *Core Dimension:* The dimensionality of each input/output of an elementary function is defined by its core dimensions (zero core dimensions correspond to a scalar input/output). The core dimensions are mapped to the last dimensions of the input/output arrays. *Dimension Name:* A dimension name represents a core dimension in the signature. Different dimensions may share a name, indicating that they are of the same size (or are broadcastable). *Dimension Index:* A dimension index is an integer representing a dimension name. It enumerates the dimension names according to the order of the first occurrence of each name in the signature. Details of Signature ¶<http://scipy.org/scipy/numpy/ticket/887#DetailsofSignature> The signature defines "core" dimensionality of input and output variables, and thereby also defines the contraction of the dimensions. The signature is represented by a string of the following format: - Core dimensions of each input or output array are represented by a list of dimension names in parentheses, "(i_1,...,i_N)"; a scalar input/output is denoted by "()". Instead of "i_1", "i_2", etc, one can use any valid Python variable name. - Dimension lists for different arguments are separated by ",". Input/output arguments are separated by "->". - If one uses the same dimension name in multiple locations, this enforces the same size (or broadcastable size) of the corresponding dimensions. The formal syntax of signatures is as follows. <Signature> ::= <Input arguments> "->" <Output arguments> <Input arguments> ::= <Argument list> <Output arguments> ::= <Argument list> <Argument list> ::= nil | <Argument> | <Argument> "," <Argument list> <Argument> ::= "(" <Core dimension list> ")" <Core dimension list> ::= nil | <Dimension name> | <Dimension name> "," <Core dimension list> <Dimension name> ::= valid Python variable name Notes: 1. All quotes are for clarity. 2. Core dimensions that share the same name must be broadcastable, as the two "i"s in our example above. Each dimension name typically corresponding to one level of looping in the elementary function's implementation. 3. White spaces are ignored. Here are some examples of signatures. add "(),()->()" inner1d "(i),(i)->()" sum1d "(i)->()" dot2d "(m,n),(n,p)->(m,p)" (matrix multiplication) outer_inner "(i,t),(j,t)->(i,j)" (inner over the last dimension, outer over the second to last, and loop/broadcast over the rest.) C-API for implementing Elementary Functions ¶<http://scipy.org/scipy/numpy/ticket/887#C-APIforimplementingElementaryFuncti...> The current interface remains unchanged, and PyUFunc_FromFuncAndData can still be used to implement (specialized) ufuncs, consisting of scalar elementary functions. One can use PyUFunc_FromFuncAndDataAndSignature to declare a more general ufunc. The argument list is the same as PyUFunc_FromFuncAndData, with an additional argument specifying the signature as C string. Furthermore, the callback function is of the same type as before, void (*foo)(char **args, intp *dimensions, intp *steps, void *func). When invoked, args is a list of length nargs containing the data of all input/output arguments. For a scalar elementary function, steps is also of length nargs, denoting the strides used for the arguments. dimensions is a pointer to a single integer defining the size of the axis to be looped over. For a non-trivial signature, dimensions will also contain the sizes of the core dimensions as well, starting at the second entry. Only one size is provided for each unique dimension name and the sizes are given according to the first occurrence of a dimension name in the signature. The first nargs elements of steps remain the same as for scalar ufuncs. The following elements contain the strides of all core dimensions for all arguments in order. For example, consider a ufunc with signature "(i,j),(i)->()". In this case, args will contain three pointers to the data of the input/output arrays a, b, c. Furthermore, dimensions will be [N, I, J] to define the size of N of the loop and the sizes I and J for the core dimensions i and j. Finally, stepswill be [a_N, b_N, c_N, a_i, a_j, b_i], containing all necessary strides. License ¶ <http://scipy.org/scipy/numpy/ticket/887#License> The attached code and the above documentation was written by Wenjie Fu < fuw@deshaw.com> and Hans-Andreas Engel <engelh@deshaw.com>, and is provided under the numpy license:
participants (5)
-
Anne Archibald
-
Charles R Harris
-
Engel, Hans-Andreas
-
Robert Kern
-
Stéfan van der Walt