Mailman 3 NEP 43 – How should ufunc (inner loops) be designed? - NumPy-Discussion

Jan. 7, 2021

      Hi all,

I would like to nudge a discuss on NEP 43. Right now, mainly the first
part: That is everything up to "Promotion and dispatching". (This is
because promotion and dispatching should be the last bigger step in
refactoring NumPy.)

The full text is at: https://numpy.org/neps/nep-0043-extensible-ufuncs.html
which includes more information and gives some background to help
understand what is going on.

I personally believe that the `ArrayMethod` concept is pretty much
necessary to group dtype resolution with the loop functionality (and
other things currently handled by the `ufunc` which may be specific to
the dtypes involved, e.g. the `identity` could go here).

But how exactly it should look like and especially how we want the
inner-loop signature to be defined is flexible.

So comments on all parts of the NEP are very welcome! My main focus
would be on the `strided_inner_loop`. That is largely a continuation of
this old, but pretty insightful discussion:
    https://github.com/numpy/numpy/issues/12518

My current proposal (C-API) is this:

    def strided_inner_loop(
            context, data, dimensions, strides, innerloop_data):
        return 0  # or -1 on error

Where `context` is a structure holding some useful information:
 * The `ArrayMethod` itself (pretty much the `self`)
 * the dtypes/descriptors of all arrays
 * the original caller/ufunc (if applicable, useful for backcompat and
   error messages)
 * anything else that may come up in the future and should be passed
here (e.g. the Python interpreter state).
 * (Possibly gufunc signature unless we duplicate it on
   the ArrayMethod.)

Context may seem a bit weird, but passing dtypes is very convenient
e.g. for strings, passing `self`, the ArrayMethod (so to speak) may
also be necessary (e.g. for Python wrapping). And a struct means we
could expand it in the future, with some care [1].
Admittedly, numeric ufuncs can simply ignore this completely.

`data, dimensions, strides` is the main part of the existing signature.

`innerloop_data` is a user allocated/defined struct that is passed in
(and has its own "free" slot).  We currently use this for casting in
NumPy (so it does not need to be public right away, `context` should be
good enough for most things).
In the NEP I am considering that to pass `npy_intp *scratch_var = 0`
when this is unused.  That would allow persistent flags (e.g. whether a
warning was already given), which otherwise would require a lot of
additional complexity.
How this `innerloop_data` should look like is maybe the thing I am
least sold on myself. The current proposal just keeps around what is
already there as private API.

There are many more issues and potential future additions and I am
happy to discuss those as well. I am singling out the signature a bit,
because it seems particularly central and painful to change again in
the future.
Another example is how the signature of `get_loop()` should look like,
since it would be nice to eventually make it public (or even allow
overriding it) [2].  That is lower priority though, since we could get
away with keeping it private API for a while.

Cheers,

Sebastian

[1] That may need some thought: If others provide the struct (and not
just consume it like a ufunc/dtype author), they would need to ensure
ABI compatibility with NumPy and always "catch up" when NumPy extends
the struct.  I expect that can be managed, but I have not planned on
how do achieve it.  The thought is that e.g. sparse arrays could use
the NumPy ArrayMethod/loops directly, and then they would have to
provide the struct.  (Those projects probably can easily catch up, but
you would need to ensure a good error when a NumPy version happens to
be too new and is thus incompatible.)

[2] The things I am worried about there are:

* How we do alignment, largest power-of-two alignment would seem nice,
but that is really per-operand:
https://github.com/numpy/numpy/issues/17359

* The `move_references` is probably useful flag for buffering of data
that contains Python objects/references (it is used because the buffer
can be cleared out immediately).  For now we can say that this is only
relevant for NumPy itself.  But the mechanism itself does probably make
sense in the context of buffered loops (i.e. casting) – I am honestly
not certain that it is actually worth much in performance/memory to
clear out buffers in one go.

NEP 43 – How should ufunc (inner loops) be designed?

Sebastian Berg

tags

participants (1)