NEP 43 – How should ufunc (inner loops) be designed?

Hi all, I would like to nudge a discuss on NEP 43. Right now, mainly the first part: That is everything up to "Promotion and dispatching". (This is because promotion and dispatching should be the last bigger step in refactoring NumPy.) The full text is at: https://numpy.org/neps/nep-0043-extensible-ufuncs.html which includes more information and gives some background to help understand what is going on. I personally believe that the `ArrayMethod` concept is pretty much necessary to group dtype resolution with the loop functionality (and other things currently handled by the `ufunc` which may be specific to the dtypes involved, e.g. the `identity` could go here). But how exactly it should look like and especially how we want the inner-loop signature to be defined is flexible. So comments on all parts of the NEP are very welcome! My main focus would be on the `strided_inner_loop`. That is largely a continuation of this old, but pretty insightful discussion: https://github.com/numpy/numpy/issues/12518 My current proposal (C-API) is this: def strided_inner_loop( context, data, dimensions, strides, innerloop_data): return 0 # or -1 on error Where `context` is a structure holding some useful information: * The `ArrayMethod` itself (pretty much the `self`) * the dtypes/descriptors of all arrays * the original caller/ufunc (if applicable, useful for backcompat and error messages) * anything else that may come up in the future and should be passed here (e.g. the Python interpreter state). * (Possibly gufunc signature unless we duplicate it on the ArrayMethod.) Context may seem a bit weird, but passing dtypes is very convenient e.g. for strings, passing `self`, the ArrayMethod (so to speak) may also be necessary (e.g. for Python wrapping). And a struct means we could expand it in the future, with some care [1]. Admittedly, numeric ufuncs can simply ignore this completely. `data, dimensions, strides` is the main part of the existing signature. `innerloop_data` is a user allocated/defined struct that is passed in (and has its own "free" slot). We currently use this for casting in NumPy (so it does not need to be public right away, `context` should be good enough for most things). In the NEP I am considering that to pass `npy_intp *scratch_var = 0` when this is unused. That would allow persistent flags (e.g. whether a warning was already given), which otherwise would require a lot of additional complexity. How this `innerloop_data` should look like is maybe the thing I am least sold on myself. The current proposal just keeps around what is already there as private API. There are many more issues and potential future additions and I am happy to discuss those as well. I am singling out the signature a bit, because it seems particularly central and painful to change again in the future. Another example is how the signature of `get_loop()` should look like, since it would be nice to eventually make it public (or even allow overriding it) [2]. That is lower priority though, since we could get away with keeping it private API for a while. Cheers, Sebastian [1] That may need some thought: If others provide the struct (and not just consume it like a ufunc/dtype author), they would need to ensure ABI compatibility with NumPy and always "catch up" when NumPy extends the struct. I expect that can be managed, but I have not planned on how do achieve it. The thought is that e.g. sparse arrays could use the NumPy ArrayMethod/loops directly, and then they would have to provide the struct. (Those projects probably can easily catch up, but you would need to ensure a good error when a NumPy version happens to be too new and is thus incompatible.) [2] The things I am worried about there are: * How we do alignment, largest power-of-two alignment would seem nice, but that is really per-operand: https://github.com/numpy/numpy/issues/17359 * The `move_references` is probably useful flag for buffering of data that contains Python objects/references (it is used because the buffer can be cleared out immediately). For now we can say that this is only relevant for NumPy itself. But the mechanism itself does probably make sense in the context of buffered loops (i.e. casting) – I am honestly not certain that it is actually worth much in performance/memory to clear out buffers in one go.
participants (1)
-
Sebastian Berg