Is there a desire to implement more functions as ufuncs? e.g. round, astype, real
Hello there! First time posting here and I apologize if this discussion is not new. I couldn't find it in a search. I've been contributing a bit to the sparse project (https://github.com/pydata/sparse) and I was working on specializing the behavior for single-argument ufuncs, because there is a faster path for some sparse arrays if the indexes don't change at all. As I was working on this I noticed that `sparse` uses `__array_ufunc__` on some non-ufunc methods, like `round`, `clip`, and `astype`, which caused some bugs in my initial attempt. This is easy enough to fix in the package, but it made me wonder if those functions _could_ or _should_ be ufuncs in numpy itself. The full list for the sparse library is `clip`, `round`, `astype`, `real`, and `imag`. There might be other candidates in numpy, those are just the ones in this project. The benefit I see is that an implementor of `__array_ufunc__` wouldn't need to implement these methods. But perhaps their interfaces are too complex for ufunc-iness?
On Tue, Jul 11, 2023 at 2:38 PM James Webber <jamestwebber@gmail.com> wrote:
Hello there! First time posting here and I apologize if this discussion is not new. I couldn't find it in a search.
I've been contributing a bit to the sparse project ( https://github.com/pydata/sparse) and I was working on specializing the behavior for single-argument ufuncs, because there is a faster path for some sparse arrays if the indexes don't change at all.
As I was working on this I noticed that `sparse` uses `__array_ufunc__` on some non-ufunc methods, like `round`, `clip`, and `astype`, which caused some bugs in my initial attempt. This is easy enough to fix in the package, but it made me wonder if those functions _could_ or _should_ be ufuncs in numpy itself.
The full list for the sparse library is `clip`, `round`, `astype`, `real`, and `imag`. There might be other candidates in numpy, those are just the ones in this project.
The benefit I see is that an implementor of `__array_ufunc__` wouldn't need to implement these methods. But perhaps their interfaces are too complex for ufunc-iness?
In principle changing functions into ufuncs is fine I think, and it can help performance and maintainability. The signatures of these functions seem fine in principle. However, the devil is usually in the details - for example:
x = np.arange(5) class AClass: ... def __init__(self, x): ... self._x = x ... ... def clip(self, x, a_min, a_max): ... return self._x ... AClass(x).clip(x, 0, 1) array([0, 1, 2, 3, 4])
That's due to `clip` being implemented with `_wrapfunc` under the hood. All those "call methods with the same name from functions" are arguably design mistakes, however unless we consciously decide to get rid of all of that at once, we have to continue to support them. It's hard to say more without trying; I think you could attempt to convert one function and see how it goes. Cheers, Ralf
Does astype make sense as a ufunc? Aaron Meurer On Tue, Jul 11, 2023 at 7:38 AM James Webber <jamestwebber@gmail.com> wrote:
Hello there! First time posting here and I apologize if this discussion is not new. I couldn't find it in a search.
I've been contributing a bit to the sparse project (https://github.com/pydata/sparse) and I was working on specializing the behavior for single-argument ufuncs, because there is a faster path for some sparse arrays if the indexes don't change at all.
As I was working on this I noticed that `sparse` uses `__array_ufunc__` on some non-ufunc methods, like `round`, `clip`, and `astype`, which caused some bugs in my initial attempt. This is easy enough to fix in the package, but it made me wonder if those functions _could_ or _should_ be ufuncs in numpy itself.
The full list for the sparse library is `clip`, `round`, `astype`, `real`, and `imag`. There might be other candidates in numpy, those are just the ones in this project.
The benefit I see is that an implementor of `__array_ufunc__` wouldn't need to implement these methods. But perhaps their interfaces are too complex for ufunc-iness? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: asmeurer@gmail.com
On Thu, 2023-07-13 at 21:53 -0500, Aaron Meurer wrote:
Does astype make sense as a ufunc?
Yes and no. Implementation wise casting is practically a ufunc. But there are real semantic differences: Casting _must_ provide the exact output dtype (something that ufuncs do support, you have to pass `out=`). If you do that, you can write a ufunc for casts though: `np.positive` is clunky, but already ends at much the same behavior (implementation wise it does extra steps of course). On the other hand, a normal ufunc doesn't normally care about the output dtype, since it must and should infer it from the inputs. Since someone now probably thinks "but there is `dtype=`". Yes, there is, but it doesn't have the same semantics, you would need a new `out_dtype=`. With you could have a slightly special `copy_ufunc` where you have to just pass `output_dtype=` (or you get a direct copy). Not sure if it makes sense to extend things though just to use `__array_ufunc__`, it might make sense to simplify our casting code (merge it into ufuncs), but maybe cast/copying is distinct enough. Clip already *is* implemented via ufuncs (minimum, maximum, and an internal `clip()`). Round could probably be a ufunc, but ufuncs don't have scalar parameters (maybe they should). And the decimals is a scalar for them. You could of course generalize decimals to not be a scalar and then you have a proper ufunc (with some fast-path magic for when it doesn't change). As Ralf said, you may also have to keep the method indirection (e.g. because otherwise you break people using `np.around()` on pandas...). - Sebastian
Aaron Meurer
On Tue, Jul 11, 2023 at 7:38 AM James Webber <jamestwebber@gmail.com> wrote:
Hello there! First time posting here and I apologize if this discussion is not new. I couldn't find it in a search.
I've been contributing a bit to the sparse project ( https://github.com/pydata/sparse) and I was working on specializing the behavior for single-argument ufuncs, because there is a faster path for some sparse arrays if the indexes don't change at all.
As I was working on this I noticed that `sparse` uses `__array_ufunc__` on some non-ufunc methods, like `round`, `clip`, and `astype`, which caused some bugs in my initial attempt. This is easy enough to fix in the package, but it made me wonder if those functions _could_ or _should_ be ufuncs in numpy itself.
The full list for the sparse library is `clip`, `round`, `astype`, `real`, and `imag`. There might be other candidates in numpy, those are just the ones in this project.
The benefit I see is that an implementor of `__array_ufunc__` wouldn't need to implement these methods. But perhaps their interfaces are too complex for ufunc-iness? _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: asmeurer@gmail.com
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: sebastian@sipsolutions.net
participants (4)
-
Aaron Meurer
-
James Webber
-
Ralf Gommers
-
Sebastian Berg