[Numpy-discussion] A little about XND

Stefan Krah skrah at bytereef.org
Mon Jun 18 15:09:50 EDT 2018


Hi Marten,

On Mon, Jun 18, 2018 at 12:34:03PM -0400, Marten van Kerkwijk wrote:
> That looks quite nice and expressive. In the context of a discussion we
> have been having about describing `matmul/@` and possibly broadcastable
> dimensions, I think from your description it sounds like one would describe
> `@` with multiple functions (the multiple dispatch we have been (are?)
> considering as well):
> 
> 
> "... * N * M * T, ... * M * P * T -> ... * N * P * T"
> "M * T, ... * M * P * T -> ... P * T"
> "... * N * M * T, M * T -> ... * N * T"
> "M * T, M * T -> T"

Yes, that's the way, and the outer dimensions (the part matched by the
ellipsis) are always broadcast like in NumPy.


> Is there a way to describe broadcasting?  The sample case we've come up
> with is a function that calculates a weighted mean. This might take
> (values, sigmas) and return (mean, sigma_mean), which would imply a
> signature like:
> 
> "... N * T, ... N * T -> ... * T, ... * T"
> 
> But would your signature allow indicating that one could pass in a single
> sigma? I.e., broadcast the second 1 to N if needed?

Actually I came across this today when implementing optimized matching
for binary functions.

I wanted the faster kernel

  "... * N * int64, ... * N * int64 -> ... * N * int64"

to also match e.g. the input

  "int64, 10 * int64".


The generic datashape spec would forbid this, but perhaps the '?' that
you propose in nep-0020 would offer a way out of this for ndtypes.


It's a bit confusing for datashape, since there is already a questionmark
for missing variable dimensions (that have shape==0 in the data).

  >>> ndt("var * ?var * int64")
  ndt("var * ?var * int64")

This would be the type for e.g. [[0], None, [1,2,3]].


But for symbolic dimensions (which only match fixed dimensions) perhaps this

   "... * ?N * int64, ... * ?N * int64 -> ... * ?N * int64"

or, as in the NEP,

   "... * N? * int64, ... * N? * int64 -> ... * N? * int64"

should mean "At least one input has ndim >= 1, broadcast as necessary".


This still means that for the "all ndim==0" case one would need an
additional kernel "int64, int64 -> int64".


> I realize that this is no longer about describing precisely what the
> function doing the calculation expects, but rather what an upper level is
> allowed to do before calling the function (i.e., take a dimension of 1 and
> broadcast it).

Yes, for datashape the problem is that it also allows non-broadcastable
signatures like "N * float64", really the same as "double x[]" in C.

But the '?' with occasionally one additional kernel for ndim==0 could
solve this.


Stefan Krah





More information about the NumPy-Discussion mailing list