[Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

Thu May 31 10:34:14 EDT 2018

On 05/31/2018 09:53 AM, Sebastian Berg wrote:
> <snip>
 >
>> Also, I do not imagine these as free-floating ufuncs, I think we can
>> arrange them in a logical way in a gufunc ecosystem. There would be
>> some
>> "core ufuncs", with "associated gufuncs" accessible as attributes.
>> For
>> instance, any_less_than will be accessible as less.any
>>
> 
> So then, why is it a gufunc and not an attribute using a ufunc with
> binary output? I have asked this before, and even got arguments as to
> why it fits gufuncs better, but frankly I still do not really
> understand.
> 
> If it is an associated gufunc, why gufunc at all? We need any() and
> all() here, so that is not that many methods, right? And when it comes
> to buffering you have much more flexibility.
> 
> Say I have the operation:
> 
> (float_arr > int_arr).all(axis=(1, 2))
> 
> With int_arr being shaped (2, 1000, 1000) (i.e. large along the
> interesting axes). A normal gufunc IIRC will get the whole inner
> dimension as a float buffer. In other words, you gain practically
> nothing, because the whole int_arr will be cast to float anyway.
> 
> If, however, you actually implement np.greater_than.all(float_arr,
> int_arr, axis=(1, 2)) as a separate ufunc method, you would have the
> freedom to work in the typical cache friendly buffersize chunk size for
> each of the outer dimensions one at a time. A gufunc would require to
> say: please do not buffer for me, or implement all possible type
> combinations to do this.
> (of course there are memory layout subtleties, since you would have to
> optimize always for the "fast exit" case, potentially making the worst
> case scenario much worse -- unless you do seriously fancy stuff
> anyway).
> 
> A more general question is actually whether we should rather focus on
> solving the same problem more generally.
> For example if `numexpr` would implement all/any reductions, it may be
> able to pretty simply get the identical tradeoffs with even more
> flexibility! (I have to admit, it may get tricky with multiple
> reduction dimensions, etc.)
> 
> - Sebastian

Hmm, I hadn't known/considered the limitations of gufunc buffer sizes. I 
was just thinking of them as a standardized interface which handles the 
where/out/broadcasting for you.

I'll have to read about it.

One thing I don't like about the ufunc-method strategy is that it esily 
pollutes all the ufuncs namespaces and their implementations, so many 
ufuncs have to account for a new "all" method even if innapropriate, for 
example.

Cheers,
Allan