Forcing gufunc to error with size zero input
I'm experimenting with gufuncs, and I just created a simple one with signature '(i)>()'. Is there a way to configure the gufunc itself so that an empty array results in an error? Or would I have to create a Python wrapper around the gufunc that does the error checking? Currently, when passed an empty array, the ufunc loop is called with the core dimension associated with i set to 0. It would be nice if the code didn't get that far, and the ufunc machinery "knew" that this gufunc didn't accept a core dimension that is 0. I'd like to automatically get an error, something like the error produced by `np.max([])`.
Warren
Can you just raise an exception in the gufuncs inner loop? Or is there no mechanism to do that today?
I don't think you were proposing that core dimensions should _never_ be allowed to be 0, but if you were I disagree. I spent a fair amount of work enabling that for linalg because it provided some convenient base cases.
We could go down the route of augmenting the gufuncs signature syntax to support requiring nonempty dimensions, like we did for optional ones  although IMO we should consider switching from a string minilanguage to a structured object specification if we plan to go too much further with extending it.
On Sat, Sep 28, 2019, 17:47 Warren Weckesser warren.weckesser@gmail.com wrote:
I'm experimenting with gufuncs, and I just created a simple one with signature '(i)>()'. Is there a way to configure the gufunc itself so that an empty array results in an error? Or would I have to create a Python wrapper around the gufunc that does the error checking? Currently, when passed an empty array, the ufunc loop is called with the core dimension associated with i set to 0. It would be nice if the code didn't get that far, and the ufunc machinery "knew" that this gufunc didn't accept a core dimension that is 0. I'd like to automatically get an error, something like the error produced by `np.max([])`.
Warren _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On 9/28/19, Eric Wieser wieser.eric+numpy@gmail.com wrote:
Can you just raise an exception in the gufuncs inner loop? Or is there no mechanism to do that today?
Maybe? I don't know what is the idiomatic way to handle errors detected in an inner loop. And pushing this particular error detection into the inner loop doesn't feel right.
I don't think you were proposing that core dimensions should _never_ be allowed to be 0,
No, I'm not suggesting that. There are many cases where a length 0 core dimension is fine.
I'm interested in the case where there is not a meaningful definition of the operation on the empty set. The mean is an example. Currently `np.mean([])` generates two warnings (one useful, the other cryptic and apparently incidental), and returns nan. Returning nan is one way to handle such a case; another is to raise an error like `np.amax([])` does. I'd like to raise an error in the example that I'm working on ('peaktopeak' at https://github.com/WarrenWeckesser/npuff). The function is a gufunc, not a reduction of a binary operation, so the 'identity' argument of PyUFunc_FromFuncAndDataAndSignature has no effect.
but if you were I disagree. I spent a fair amount of work enabling that for linalg because it provided some convenient base cases.
We could go down the route of augmenting the gufuncs signature syntax to support requiring nonempty dimensions, like we did for optional ones  although IMO we should consider switching from a string minilanguage to a structured object specification if we plan to go too much further with extending it.
After only a quick glance at that code: one option is to add a '+' after the input names in the signature that must have a length that is at least 1. So the signature for functions like `mean` (if you were to reimplement it as a gufunc, and wanted an error instead of nan), `amax`, `ptp`, etc, would be '(i+)>()'.
However, the only meaningful usescases of this enhancement that I've come up with are these simple reductions. So I don't know if making such a change to the signature is worthwhile. On the other hand, there are many examples of useful 1d reductions that aren't the reduction of an associative binary operation. It might be worthwhile to have a new convenience function just for the case '(i)>()', maybe something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's ugly, but I think you get the idea), and that function can have an argument to specify that the length must be at least 1.
I'll see if that is feasible, but I won't be surprised to learn that there are good reasons for *not* doing that.
Warren
On Sat, Sep 28, 2019, 17:47 Warren Weckesser warren.weckesser@gmail.com wrote:
I'm experimenting with gufuncs, and I just created a simple one with signature '(i)>()'. Is there a way to configure the gufunc itself so that an empty array results in an error? Or would I have to create a Python wrapper around the gufunc that does the error checking? Currently, when passed an empty array, the ufunc loop is called with the core dimension associated with i set to 0. It would be nice if the code didn't get that far, and the ufunc machinery "knew" that this gufunc didn't accept a core dimension that is 0. I'd like to automatically get an error, something like the error produced by `np.max([])`.
Warren _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On 9/29/19, Warren Weckesser warren.weckesser@gmail.com wrote:
On 9/28/19, Eric Wieser wieser.eric+numpy@gmail.com wrote:
Can you just raise an exception in the gufuncs inner loop? Or is there no mechanism to do that today?
Maybe? I don't know what is the idiomatic way to handle errors detected in an inner loop. And pushing this particular error detection into the inner loop doesn't feel right.
I don't think you were proposing that core dimensions should _never_ be allowed to be 0,
No, I'm not suggesting that. There are many cases where a length 0 core dimension is fine.
I'm interested in the case where there is not a meaningful definition of the operation on the empty set. The mean is an example. Currently `np.mean([])` generates two warnings (one useful, the other cryptic and apparently incidental), and returns nan. Returning nan is one way to handle such a case; another is to raise an error like `np.amax([])` does. I'd like to raise an error in the example that I'm working on ('peaktopeak' at https://github.com/WarrenWeckesser/npuff). The function is a gufunc, not a reduction of a binary operation, so the 'identity' argument of PyUFunc_FromFuncAndDataAndSignature has no effect.
but if you were I disagree. I spent a fair amount of work enabling that for linalg because it provided some convenient base cases.
We could go down the route of augmenting the gufuncs signature syntax to support requiring nonempty dimensions, like we did for optional ones  although IMO we should consider switching from a string minilanguage to a structured object specification if we plan to go too much further with extending it.
After only a quick glance at that code: one option is to add a '+' after the input names in the signature that must have a length that is at least 1. So the signature for functions like `mean` (if you were to reimplement it as a gufunc, and wanted an error instead of nan), `amax`, `ptp`, etc, would be '(i+)>()'.
However, the only meaningful usescases of this enhancement that I've come up with are these simple reductions.
Of course, just minutes after sending the email, I realized I *do* know of other signatures that could benefit from a check on the core dimension size. An implementation of Pearson's correlation coefficient as a gufunc would have signature (i),(i)>(), and the core dimension i must be at least *2* for the calculation to be well defined. Other correlations would also likely require a nonzero core dimension.
Warren
So I don't know if making such a change to the signature is worthwhile. On the other hand, there are many examples of useful 1d reductions that aren't the reduction of an associative binary operation. It might be worthwhile to have a new convenience function just for the case '(i)>()', maybe something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's ugly, but I think you get the idea), and that function can have an argument to specify that the length must be at least 1.
I'll see if that is feasible, but I won't be surprised to learn that there are good reasons for *not* doing that.
Warren
On Sat, Sep 28, 2019, 17:47 Warren Weckesser warren.weckesser@gmail.com wrote:
I'm experimenting with gufuncs, and I just created a simple one with signature '(i)>()'. Is there a way to configure the gufunc itself so that an empty array results in an error? Or would I have to create a Python wrapper around the gufunc that does the error checking? Currently, when passed an empty array, the ufunc loop is called with the core dimension associated with i set to 0. It would be nice if the code didn't get that far, and the ufunc machinery "knew" that this gufunc didn't accept a core dimension that is 0. I'd like to automatically get an error, something like the error produced by `np.max([])`.
Warren _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On Sun, 20190929 at 00:20 0400, Warren Weckesser wrote:
On 9/28/19, Eric Wieser wieser.eric+numpy@gmail.com wrote:
Can you just raise an exception in the gufuncs inner loop? Or is there no mechanism to do that today?
Maybe? I don't know what is the idiomatic way to handle errors detected in an inner loop. And pushing this particular error detection into the inner loop doesn't feel right.
Basically, since you want to release the GIL, you can grab and set an error right now. That will work, although grabbing the GIL from the inner loop is not ideal, at least in the sense that it does not work with subinterpreters (but numpy does not currently work with those in any case). We do use this internally, I believe.
Well, even without dtypes, I think we probably want a few extra API around UFuncs, and that is setup/teardown (not necessarily as such functions), as well as a return value for the inner loop to signal iteration stop.
There was a long discussion about that, for example here: https://github.com/numpy/numpy/issues/12518
There is another usecase, that we probably want to allow optimized loop selection (necessary/used in casting)..
Note that I believe all of this type of logic should be moved into a UFuncImpl [0] object, so that it can be DType (and especially user DType) specific without bloating up the current UFunc object too much. Although that puts a lot of power out there, so may be good to limit it a lot iniyially
Best,
Sebastian
[0] It was Erics suggestion/name, I do not know if it came up earlier.
I don't think you were proposing that core dimensions should _never_ be allowed to be 0,
No, I'm not suggesting that. There are many cases where a length 0 core dimension is fine.
I'm interested in the case where there is not a meaningful definition of the operation on the empty set. The mean is an example. Currently `np.mean([])` generates two warnings (one useful, the other cryptic and apparently incidental), and returns nan. Returning nan is one way to handle such a case; another is to raise an error like `np.amax([])` does. I'd like to raise an error in the example that I'm working on ('peaktopeak' at https://github.com/WarrenWeckesser/npuff). The function is a gufunc, not a reduction of a binary operation, so the 'identity' argument of PyUFunc_FromFuncAndDataAndSignature has no effect.
but if you were I disagree. I spent a fair amount of work enabling that for linalg because it provided some convenient base cases.
We could go down the route of augmenting the gufuncs signature syntax to support requiring nonempty dimensions, like we did for optional ones  although IMO we should consider switching from a string minilanguage to a structured object specification if we plan to go too much further with extending it.
After only a quick glance at that code: one option is to add a '+' after the input names in the signature that must have a length that is at least 1. So the signature for functions like `mean` (if you were to reimplement it as a gufunc, and wanted an error instead of nan), `amax`, `ptp`, etc, would be '(i+)>()'.
However, the only meaningful usescases of this enhancement that I've come up with are these simple reductions. So I don't know if making such a change to the signature is worthwhile. On the other hand, there are many examples of useful 1d reductions that aren't the reduction of an associative binary operation. It might be worthwhile to have a new convenience function just for the case '(i)>()', maybe something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's ugly, but I think you get the idea), and that function can have an argument to specify that the length must be at least 1.
I'll see if that is feasible, but I won't be surprised to learn that there are good reasons for *not* doing that.
Warren
On Sat, Sep 28, 2019, 17:47 Warren Weckesser < warren.weckesser@gmail.com> wrote:
I'm experimenting with gufuncs, and I just created a simple one with signature '(i)>()'. Is there a way to configure the gufunc itself so that an empty array results in an error? Or would I have to create a Python wrapper around the gufunc that does the error checking? Currently, when passed an empty array, the ufunc loop is called with the core dimension associated with i set to 0. It would be nice if the code didn't get that far, and the ufunc machinery "knew" that this gufunc didn't accept a core dimension that is 0. I'd like to automatically get an error, something like the error produced by `np.max([])`.
Warren _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
On 9/29/19, Warren Weckesser warren.weckesser@gmail.com wrote:
On 9/28/19, Eric Wieser wieser.eric+numpy@gmail.com wrote:
Can you just raise an exception in the gufuncs inner loop? Or is there no mechanism to do that today?
Maybe? I don't know what is the idiomatic way to handle errors detected in an inner loop. And pushing this particular error detection into the inner loop doesn't feel right.
I don't think you were proposing that core dimensions should _never_ be allowed to be 0,
No, I'm not suggesting that. There are many cases where a length 0 core dimension is fine.
I'm interested in the case where there is not a meaningful definition of the operation on the empty set. The mean is an example. Currently `np.mean([])` generates two warnings (one useful, the other cryptic and apparently incidental), and returns nan. Returning nan is one way to handle such a case; another is to raise an error like `np.amax([])` does. I'd like to raise an error in the example that I'm working on ('peaktopeak' at https://github.com/WarrenWeckesser/npuff). The
FYI: I renamed that repository to 'ufunclab': https://github.com/WarrenWeckesser/ufunclab
Warren
function is a gufunc, not a reduction of a binary operation, so the 'identity' argument of PyUFunc_FromFuncAndDataAndSignature has no effect.
but if you were I disagree. I spent a fair amount of work enabling that for linalg because it provided some convenient base cases.
We could go down the route of augmenting the gufuncs signature syntax to support requiring nonempty dimensions, like we did for optional ones  although IMO we should consider switching from a string minilanguage to a structured object specification if we plan to go too much further with extending it.
After only a quick glance at that code: one option is to add a '+' after the input names in the signature that must have a length that is at least 1. So the signature for functions like `mean` (if you were to reimplement it as a gufunc, and wanted an error instead of nan), `amax`, `ptp`, etc, would be '(i+)>()'.
However, the only meaningful usescases of this enhancement that I've come up with are these simple reductions. So I don't know if making such a change to the signature is worthwhile. On the other hand, there are many examples of useful 1d reductions that aren't the reduction of an associative binary operation. It might be worthwhile to have a new convenience function just for the case '(i)>()', maybe something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's ugly, but I think you get the idea), and that function can have an argument to specify that the length must be at least 1.
I'll see if that is feasible, but I won't be surprised to learn that there are good reasons for *not* doing that.
Warren
On Sat, Sep 28, 2019, 17:47 Warren Weckesser warren.weckesser@gmail.com wrote:
I'm experimenting with gufuncs, and I just created a simple one with signature '(i)>()'. Is there a way to configure the gufunc itself so that an empty array results in an error? Or would I have to create a Python wrapper around the gufunc that does the error checking? Currently, when passed an empty array, the ufunc loop is called with the core dimension associated with i set to 0. It would be nice if the code didn't get that far, and the ufunc machinery "knew" that this gufunc didn't accept a core dimension that is 0. I'd like to automatically get an error, something like the error produced by `np.max([])`.
Warren _______________________________________________ NumPyDiscussion mailing list NumPyDiscussion@python.org https://mail.python.org/mailman/listinfo/numpydiscussion
participants (3)

Eric Wieser

Sebastian Berg

Warren Weckesser