Some thoughts on ufunc reduction methods

The following write up was triggered by issue #7265 <https://github.com/numpy/numpy/issues/7265>, you may want to review that discussion for context: In the docs, ufunc reduction operations are explained as "cumulatively applying the ufunc along an axis." This basically limits just about any reduction operation to ufuncs with two inpus and one output. Another take on the same reduction idea is that, if the inputs are a and b and the output is c, c is the result of updating a with the value in b. With this idea in mind, in e.g. add(a, b) -> c = a + b, c would be thought of as the updated value of a, after applying b to it. One would expect that, for any registered loop suitable for this interpretation, the types of a and c would be identical, but b could be of some other type. As an example consider count(a, b) -> c = a + 1, where a and c would typically be intp, but b could have just about any type. The power of this description of ufuncs suited for reduction is that it is very easy to generalize beyond the current "two inputs, one output" restriction. E.g. a "sum of squared differences" ufunc defined as: ssqd(sd2, x, y) -> sd2 += (x - y)**2, could be used to compute squared euclidean distances between vectors doing: ssqd.reduce((x, y)), without the memory overhead of current available approaches. In general, a reduction of a ufunc with m inputs and n outputs, m > n, would produce n results out of m - n inputs. Such a ufunc would have to define a suitable identity to initialize each of those m - n outputs at the beginning of any reduction, rather than the single identity ufuncs now hold. It can be argued that this generalization of reduction is just a redefinition of a subset of what gufuncs can do, e.g. a "sum of squared differences" gufunc with signature (n),(n)->() would do the same as the above reduction, probably faster. And it is not clear that getting accumulate and reduceat for free, or reduction over multiple axes, justifies the extra complexity. Especially since it is likely that something similarly powerful could be built on top of gufuncs. Where an approach such as this would shine is if we had a reduction method that did not require a single strided memory segment to act on. Setting aside the generalization of reduction described above, a groupby or reduceby method would take an array of values and an array of groups, and compute a reduction value for each of the unique groups. With the above generalizations, one could compute, e.g. the variance over each group by doing something like: # how exactly does keepdims work here is not clear at all! counts = count.reduceby(values, groups, keepdims=True) mean = np.add.reduceby(values, groups, keepdims=True) / counts var = np.ssqd.reduceby((values, mean), groups) / counts I am not fully sure of whether this is just useful for this particular little example of computing variance using a ufunc method that doesn't exist, or if it is valid in other settings too. Implementing this would add complexity to an already complex system, so it better be good for something! One of the weakest points I see is the need to rely on a set of predefined identities to start the reduction. Think of trying to generalize reduction to gufuncs, another worthy effort, and take matrix multiplication with signature (i,j),(j,k)->(i,k). In the current paradigm you would impose that both inputs and the output parameters be interchangeable, which leads rather naturally to the condition i = j = k for this reduction to be doable. But with the new paradigm you only need that the first input and output be interchangeable, which only provides you with j = k as a condition. And while this is a more general behavior, the questions of what value to set i to in the output, and what to init the output to, when trying to do a reduction over a stack of square arrays, make it fairly unusable. So a backup to the "two inputs, one output, initialize to some item in the reduction array" would have to be kept in place It is also important to note that expanding reduction to gufuncs would probably require that we impose some iteration order guarantees on ourselves, as the poster child for this (matrix multiplication) is not in general a commutative operation. Would we want to do this to ourselves? Would the features today justify the burden for ever after? Any thoughts on where, if anywhere, to go with this, are very welcome! Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
participants (1)
-
Jaime Fernández del Río