[Numpy-discussion] Allowing broadcasting of code dimensions in generalized ufuncs

Tue Jun 12 02:35:56 EDT 2018

Frozen dimensions:

I started with just making every 3-vector and 3x3-matrix structured arrays
with the relevant single sub-array entry

I was actually suggesting omitting the structured dtype (ie, field names)
altogether, and just using the subarray dtypes (which exist alone, but not
in arrays).

Another (small?) advantage is that I can use `axis

This is a fair argument against my proposal - at any rate, I think we’d
need a better story for subarray dtypes before trying to add support to
them for ufuncs
------------------------------

Broadcasting dimensions

But perhaps a putative weighted_mean … is a decent example

That’s fairly convincing as a non-chained ufunc case. Can you add an
example like that to the NEP?

Also, it has the benefit of being clear what the function can handle by
inspection of the signature

Is broadcasting (n),(n)->(),() less clear that (n|1),(n|1)->(),()? Can you
come up with an example where only some dimensions make sense to broadcast?
------------------------------

Eric

On Mon, 11 Jun 2018 at 08:04 Marten van Kerkwijk <m.h.vankerkwijk at gmail.com>
wrote:

> Nathaniel:
>>
>> Output shape feels very similar to
>> output dtype to me, so maybe the general way to handle this would be
>> to make the first callback take the input shapes+dtypes and return the
>> desired output shapes+dtypes?
>>
>> This hits on an interesting alternative to frozen dimensions - np.cross
>> could just become a regular ufunc with signature np.dtype((float64, 3)),
>> np.dtype((float64, 3)) → np.dtype((float64, 3))
>>
> As you note further down, the present proposal of just using numbers has
> the advantage of being clear and easy. Another (small?) advantage is that I
> can use `axis` to tell where my three coordinates are, rather than be stuck
> with having them as the last dimension.
>
> Indeed, in my trials for wrapping the Standards Of Fundamental Astronomy
> routines, I started with just making every 3-vector and 3x3-matrix
> structured arrays with the relevant single sub-array entry. That worked,
> but I ended up disliking the casting to and fro.
>
>
>> Furthermore, the expansion quickly becomes cumbersome. For instance, for
>> the all_equal signature of (n|1),(n|1)->() …
>>
>> I think this is only a good argument when used in conjunction with the
>> broadcasting syntax. I don’t think it’s a reason for matmul not to have
>> multiple signatures. Having multiple signatures is an disincentive to
>> introduced too many overloads of the same function, which seems like a good
>> thing to me
>>
> But implementation for matmul is actually considerably trickier, since the
> internal loop now has to check the number of distinct dimensions.
>
>
>> Summarizing my overall opinions:
>>
>>    - I’m +0.5 on frozen dimensions. The use-cases seem reasonable, and
>>    it seems like an easy-ish way to get them. Allowing ufuncs to natively
>>    support subarray types might be a tidier solution, but that could come down
>>    the road
>>
>> Indeed, they are not mutually exclusive. My guess would be that the use
> cases would be somewhat different.
>
>
>>
>>    - I’m -1 on optional dimensions: they seem to legitimize creating
>>    many overloads of gufuncs. I’m already not a fan of how matmul has special
>>    cases for lower dimensions that don’t generalize well. To me, the best way
>>    to handle matmul would be to use the proposed __array_function__ to
>>    handle the shape-based special-case dispatching, either by:
>>       - Inserting dimensions, and calling the true gufunc
>>       np.linalg.matmul_2d (which is a function I’d like direct access to
>>       anyway).
>>       - Dispatching to one of four ufuncs
>>
>> I must admit I wish that `@` was just pure matrix multiplication...  But
> otherwise agree with Stephan as optional dimensions being the least-bad
> solution.
>
> Aside: do agree we should think about how to expose the `linalg` gufuncs.
>
>>
>>    - Broadcasting dimensions:
>>       - I know you’re not suggesting this but: enabling broadcasting
>>       unconditionally for all gufuncs would be a bad idea, masking linalg bugs.
>>       (although einsum does support broadcasting…)
>>
>> Indeed, definitely *not* suggesting that!
>
>
>>
>>    -
>>       - Does it really need a per-dimension flag, rather than a global
>>       one? Can you give a case where that’s useful?
>>
>> Mostly simply that the implementation is easier given the optional
> dimensions... Also, it has the benefit of being clear what the function can
> handle by inspection of the signature, i.e., it self-documents better (one
> of my main arguments in favour of frozen dimensions...).
>
>
>>
>>    -
>>       - If we’d already made all_equal a gufunc, I’d be +1 on adding
>>       broadcasting support to it
>>       - I’m -0.5 on the all_equal path in the first place. I think we
>>       either should have a more generic approach to combined ufuncs, or just
>>       declare them numbas job.
>>
>> I am working on and off on a way to generically chain ufuncs (goal would
> be to auto-create an inner loop that calls all the chained ufuncs loops in
> turn). Not sure that short-circuiting will be all that easy.
>
> I actually quite like the all_equal ufunc, but it is in part because I
> remember discovering how painfully slow (a==b).all() was (and still have a
> place where I would use it if it existed). And it does fit in the
> (admittedly vague) plans to try to make `.reduce` a gufunc.
>
>>
>>    -
>>       - Can you come up with a broadcasting use-case that isn’t just
>>       chaining a reduction with a broadcasting ufunc?
>>
>> Perhaps the use is that it allows people to write gufuncs that are like
> such functions... Absent a mechanism to chain ufuncs, more complicated
> gufuncs are currently the easiest way to get fast more complicated algebra.
>
> But perhaps a putative
>
> weighted_mean(y, sigma) -> mean, sigma_mean
>
> is a decent example? Its signature would be
>
> (n),(n)->(),()
>
> but then you're forced to give individual sigmas for each point. With
>
> (n|1),(n|1)->(),()
>
> you are no longer forced to do that (though the case of all y being the
> same is less than useful here... I did at some point have an implementation
> that worked by core dimension of each argument, but ended up feeling it was
> not worth the extra complication)
>
> -- Marten
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180611/a05e705a/attachment-0001.html>