[Numpy-discussion] Proposal for new ufunc functionality

Mon Apr 12 18:31:16 EDT 2010

On Mon, Apr 12, 2010 at 17:26, Travis Oliphant <oliphant at enthought.com> wrote:
>
> On Apr 11, 2010, at 2:56 PM, Anne Archibald wrote:
>
> 2010/4/10 Stéfan van der Walt <stefan at sun.ac.za>:
>
> On 10 April 2010 19:45, Pauli Virtanen <pav at iki.fi> wrote:
>
> Another addition to ufuncs that should be though about is specifying the
>
> Python-side interface to generalized ufuncs.
>
> This is an interesting idea; what do you have in mind?
>
> I can see two different kinds of answer to this question: one is a
> tool like vectorize/frompyfunc that allows construction of generalized
> ufuncs from python functions, and the other is thinking out what
> methods and support functions generalized ufuncs need.
>
> The former would be very handy for prototyping gufunc-based libraries
> before delving into the templated C required to make them actually
> efficient.
>
> The latter is more essential in the long run: it'd be nice to have a
> reduce-like function, but obviously only when the arity and dimensions
> work out right (which I think means (shape1,shape2)->(shape2) ). This
> could be applied along an axis or over a whole array. reduceat and the
> other, more sophisticated, schemes might also be worth supporting. At
> a more elementary level, gufunc objects should have good introspection
> - docstrings, shape specification accessible from python, named formal
> arguments, et cetera. (So should ufuncs, for that matter.)
>
> We should collect all of these proposals into a NEP.      To clarify what I
> mean by "group-by" behavior.
> Suppose I have an array of floats and an array of integers.   Each element
> in the array of integers represents a region in the float array of a certain
> "kind".   The reduction should take place over like-kind values:
> Example:
> add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,2,0,0,2,2])
> results in the calculations:
> 1 + 3 + 6 + 7
> 2 + 4
> 5 + 8 + 9
> and therefore the output (notice the two arrays --- perhaps a structured
> array should be returned instead...)
> [0,1,2],
> [17, 6, 22]
>
> The real value is when you have tabular data and you want to do reductions
> in one field based on values in another field.   This happens all the time
> in relational algebra and would be a relatively straightforward thing to
> support in ufuncs.

I might suggest a simplification where the by array must be an array
of non-negative ints such that they are indices into the output. For
example (note that I replace 2 with 3 and have no 2s in the by array):

add.reduceby(array=[1,2,3,4,5,6,7,8,9], by=[0,1,0,1,3,0,0,3,3]) ==
[17, 6, 0, 22]

This basically generalizes bincount() to other binary ufuncs.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco