Is a Python function a gufunc if it broadcasts its arguments appropriately?
![](https://secure.gravatar.com/avatar/b5b57a4476471cad04132f0a061f111f.jpg?s=120&d=mm&r=g)
For example, is the function `stack` below a gufunc with signature (),()->(2)? def stack(a, b): broadcasts = np.broadcast_arrays(a, b) return np.stack(broadcasts, axis=-1) Or must gufuncs be written in C? Or are other things required?
![](https://secure.gravatar.com/avatar/8dee92c45a4934efe0c4532c3aa60090.jpg?s=120&d=mm&r=g)
Yes, Numba (https://numba.pydata.org/) lets you write gufuncs without the performance penalty that would be incurred if it were written in pure Python: https://numba.pydata.org/numba-doc/dev/user/vectorize.html#the-guvectorize-d... On Fri, Dec 27, 2024 at 2:10 AM john.a.dawson--- via NumPy-Discussion < numpy-discussion@python.org> wrote:
![](https://secure.gravatar.com/avatar/8dee92c45a4934efe0c4532c3aa60090.jpg?s=120&d=mm&r=g)
I think you're right: the function `stack`, as you've defined it, is a gufunc. Here's an implementation using np.vectorize <https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html> (rather than nb.vectorize <https://numba.readthedocs.io/en/stable/user/vectorize.html>). Since the signature is not "()->()" or "(),()->()" (or similar with a different number of scalar inputs and scalar outputs), it's a generalized ufunc.
def stack(a, b):
... broadcasts = np.broadcast_arrays(a, b) ... return np.stack(broadcasts, axis=-1) ...
stacky = np.vectorize(stack, signature="(),()->(2)")
stacky(np.arange(5), np.arange(5))
array([[0, 0], [1, 1], [2, 2], [3, 3], [4, 4]])
stacky(np.arange(5), np.array([[1], [2], [3], [4], [5]]))
array([[[0, 1], [1, 1], [2, 1], [3, 1], [4, 1]], [[0, 2], [1, 2], [2, 2], [3, 2], [4, 2]], [[0, 3], [1, 3], [2, 3], [3, 3], [4, 3]], [[0, 4], [1, 4], [2, 4], [3, 4], [4, 4]], [[0, 5], [1, 5], [2, 5], [3, 5], [4, 5]]]) On Tue, Dec 31, 2024 at 2:44 AM john.a.dawson--- via NumPy-Discussion < numpy-discussion@python.org> wrote:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Stack is not a generalized ufunc. It may behave similar to one in many ways, but implementation wise has nothing to do with ufuncs. Also ufuncs do not support an arbitrary number of operands. `vectorize` can indeed mimic generalized ufuncs, but (unfortunately) doesn't create them as such currently. - Sebastian On Tue, 2024-12-31 at 10:20 -0600, Jim Pivarski via NumPy-Discussion wrote:
![](https://secure.gravatar.com/avatar/8dee92c45a4934efe0c4532c3aa60090.jpg?s=120&d=mm&r=g)
In preparing my email, I was looking for a way to identify a ufunc, other than `isinstance`, since `np.vectorize` and `nb.vectorize` make objects that behave like ufuncs (even obeying NEP 13 semantics, checking for `__array_ufunc__`), but they don't inherit from the `np.ufunc` type. I was thinking that a "ufunc" might be defined as a protocol, like Python's `Sequence` or `Mapping`. I couldn't find a way to do that, particularly because the desired features of a ufunc involve more than having particular methods—they need to check their arguments and call particular methods: `__array_ufunc__`. That's not something one can check with `hasattr`. Sebastian, are you saying that the only definition of "ufunc" is that it inherits from `np.ufunc`? As far as I know, the only way to do that is by implementing it in C (and maybe even depend on a version of the NumPy ABI). But it's useful for libraries other than NumPy to be able to define ufuncs, starting with SciPy, which defines a lot of them. Are there any plans to define "ufunc" as a protocol, rather than a type that must be inherited to be recognized? Its behaviors are well defined at this point. Jim On Thu, Jan 2, 2025, 5:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Thu, 2025-01-02 at 10:17 -0600, Jim Pivarski wrote:
Yes, I am saying that. I think `np.vectorize` mimics gufuncs (if a gufunc) and `np.frompyfunc` creates a real ufunc. There was always some discussion about allowing to create a "C ufunc" that is implemented in Python (including gufuncs). That works, but is a little bit more raw than one might expect (basically, array data ownership must be limited to the function and we have no way to enforce that). As for a protocol.... Maybe, but not sure. Another thing I think we should add anyway is a "I do everything" path in a ufunc. And once you have that, the main thing that C-side ufunc implementation (or wrapper at that point?) would probably be: 1. Figure out any necessary `__array_ufunc__` things. 2. Decide on the result DType and select the appropriate implementation (which could be a single implementation that is always used). One downside of this is, that NumPy ufuncs have the abstraction that allows them to work on non-NumPy arrays (and no, I don't mean via `__array_ufunc__`, I mean that awkward array could low-level call the 1-D internal loop in principle). And if you "do everything" in a Python function, you may lose that part. In practice, that probably doesn't matter much, though. Even if awkward-array starts using this, it's probably a thing to figure out then how to deal with it. As a protocol only? Maybe. I haven't thought about it, the above might cover most things and also allow a few nice low-level nice tricks maybe. (I am thinking e.g. of a just in time compiled ufunc kernel for the specific array dimensions that e.g. also inlines casting. Or a gufunc that pushes work to a GPU, although in most cases maybe we wouldn't even need this.) - Sebastian
![](https://secure.gravatar.com/avatar/8dee92c45a4934efe0c4532c3aa60090.jpg?s=120&d=mm&r=g)
Yes, Numba (https://numba.pydata.org/) lets you write gufuncs without the performance penalty that would be incurred if it were written in pure Python: https://numba.pydata.org/numba-doc/dev/user/vectorize.html#the-guvectorize-d... On Fri, Dec 27, 2024 at 2:10 AM john.a.dawson--- via NumPy-Discussion < numpy-discussion@python.org> wrote:
![](https://secure.gravatar.com/avatar/8dee92c45a4934efe0c4532c3aa60090.jpg?s=120&d=mm&r=g)
I think you're right: the function `stack`, as you've defined it, is a gufunc. Here's an implementation using np.vectorize <https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html> (rather than nb.vectorize <https://numba.readthedocs.io/en/stable/user/vectorize.html>). Since the signature is not "()->()" or "(),()->()" (or similar with a different number of scalar inputs and scalar outputs), it's a generalized ufunc.
def stack(a, b):
... broadcasts = np.broadcast_arrays(a, b) ... return np.stack(broadcasts, axis=-1) ...
stacky = np.vectorize(stack, signature="(),()->(2)")
stacky(np.arange(5), np.arange(5))
array([[0, 0], [1, 1], [2, 2], [3, 3], [4, 4]])
stacky(np.arange(5), np.array([[1], [2], [3], [4], [5]]))
array([[[0, 1], [1, 1], [2, 1], [3, 1], [4, 1]], [[0, 2], [1, 2], [2, 2], [3, 2], [4, 2]], [[0, 3], [1, 3], [2, 3], [3, 3], [4, 3]], [[0, 4], [1, 4], [2, 4], [3, 4], [4, 4]], [[0, 5], [1, 5], [2, 5], [3, 5], [4, 5]]]) On Tue, Dec 31, 2024 at 2:44 AM john.a.dawson--- via NumPy-Discussion < numpy-discussion@python.org> wrote:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
Stack is not a generalized ufunc. It may behave similar to one in many ways, but implementation wise has nothing to do with ufuncs. Also ufuncs do not support an arbitrary number of operands. `vectorize` can indeed mimic generalized ufuncs, but (unfortunately) doesn't create them as such currently. - Sebastian On Tue, 2024-12-31 at 10:20 -0600, Jim Pivarski via NumPy-Discussion wrote:
![](https://secure.gravatar.com/avatar/8dee92c45a4934efe0c4532c3aa60090.jpg?s=120&d=mm&r=g)
In preparing my email, I was looking for a way to identify a ufunc, other than `isinstance`, since `np.vectorize` and `nb.vectorize` make objects that behave like ufuncs (even obeying NEP 13 semantics, checking for `__array_ufunc__`), but they don't inherit from the `np.ufunc` type. I was thinking that a "ufunc" might be defined as a protocol, like Python's `Sequence` or `Mapping`. I couldn't find a way to do that, particularly because the desired features of a ufunc involve more than having particular methods—they need to check their arguments and call particular methods: `__array_ufunc__`. That's not something one can check with `hasattr`. Sebastian, are you saying that the only definition of "ufunc" is that it inherits from `np.ufunc`? As far as I know, the only way to do that is by implementing it in C (and maybe even depend on a version of the NumPy ABI). But it's useful for libraries other than NumPy to be able to define ufuncs, starting with SciPy, which defines a lot of them. Are there any plans to define "ufunc" as a protocol, rather than a type that must be inherited to be recognized? Its behaviors are well defined at this point. Jim On Thu, Jan 2, 2025, 5:41 AM Sebastian Berg <sebastian@sipsolutions.net> wrote:
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On Thu, 2025-01-02 at 10:17 -0600, Jim Pivarski wrote:
Yes, I am saying that. I think `np.vectorize` mimics gufuncs (if a gufunc) and `np.frompyfunc` creates a real ufunc. There was always some discussion about allowing to create a "C ufunc" that is implemented in Python (including gufuncs). That works, but is a little bit more raw than one might expect (basically, array data ownership must be limited to the function and we have no way to enforce that). As for a protocol.... Maybe, but not sure. Another thing I think we should add anyway is a "I do everything" path in a ufunc. And once you have that, the main thing that C-side ufunc implementation (or wrapper at that point?) would probably be: 1. Figure out any necessary `__array_ufunc__` things. 2. Decide on the result DType and select the appropriate implementation (which could be a single implementation that is always used). One downside of this is, that NumPy ufuncs have the abstraction that allows them to work on non-NumPy arrays (and no, I don't mean via `__array_ufunc__`, I mean that awkward array could low-level call the 1-D internal loop in principle). And if you "do everything" in a Python function, you may lose that part. In practice, that probably doesn't matter much, though. Even if awkward-array starts using this, it's probably a thing to figure out then how to deal with it. As a protocol only? Maybe. I haven't thought about it, the above might cover most things and also allow a few nice low-level nice tricks maybe. (I am thinking e.g. of a just in time compiled ufunc kernel for the specific array dimensions that e.g. also inlines casting. Or a gufunc that pushes work to a GPU, although in most cases maybe we wouldn't even need this.) - Sebastian
participants (3)
-
Jim Pivarski
-
john.a.dawson@proton.me
-
Sebastian Berg