[Numpy-discussion] Changes to generalized ufunc core dimension checking

Thu Mar 17 18:28:11 EDT 2016

On Thu, Mar 17, 2016 at 10:41 PM, Stephan Hoyer <shoyer at gmail.com> wrote:

> On Thu, Mar 17, 2016 at 1:04 AM, Travis Oliphant <travis at continuum.io>
> wrote:
>
>> I think that is a good idea.    Let the user decide if scalar
>> broadcasting is acceptable for their function.
>>
>> Here is a simple concrete example where scalar broadcasting makes sense:
>>
>>
>> A 1-d dot product (the core of np.inner)   (k), (k) -> ()
>>
>> A user would assume they could call this function with a scalar in either
>> argument and have it broadcast to a 1-d array.    Of course, if both
>> arguments are scalars, then it doesn't make sense.
>>
>> Having a way for the user to allow scalar broadcasting seems sensible and
>> a nice compromise.
>>
>> -Travis
>>
>
> To generalize a little bit, consider the entire family of weighted
> statistical function (mean, std, median, etc.). For example, the gufunc
> version of np.average is basically equivalent to np.inner with a bit of
> preprocessing.
>
> Arguably, it *could* make sense to broadcast weights when given a scalar:
> np.average(values, weights=1.0 / len(values)) is pretty unambiguous.
>
> That said, adding an explicit "scalar broadcasting OK flag" seems like a
> hack that will need even more special logic (e.g., so we can error if both
> arguments to np.inner are scalars).
>
> Multiple dispatch for gufunc core signatures seems like the cleaner
> solution. If you want np.inner to handle scalars, you need to supply core
> signatures (k),()->() and (),(k)->() along with (k),(k)->(). This is the
> similar to vision of three core signatures for np.matmul: (i),(i,j)->(j),
> (i,j),(j)->(i) and (i,j),(j,k)->(i,k).
>

Would the logic for such a thing be consistent? E.g. how do you decide if
the user is requesting (k),(k)->(), or (k),()->() with broadcasting over a
non-core dimension of size k in the second argument? What if your
signatures are (m, k),(k)->(m) and (k),(n,k)->(n) and your two inputs are
(m,k) and (n,k), how do you decide which one to call? Or alternatively, how
do you detect and forbid such ambiguous designs? Figuring out the dispatch
rules for the general case seems like a non-trivial problem to me.

Jaime

-- 
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160317/80272a28/attachment.html>