[Numpy-discussion] Forcing gufunc to error with size zero input

Sun Sep 29 00:43:36 EDT 2019

On Sun, 2019-09-29 at 00:20 -0400, Warren Weckesser wrote:
> On 9/28/19, Eric Wieser <wieser.eric+numpy at gmail.com> wrote:
> > Can you just raise an exception in the gufuncs inner loop? Or is
> > there no
> > mechanism to do that today?
> 
> Maybe?  I don't know what is the idiomatic way to handle errors
> detected in an inner loop.  And pushing this particular error
> detection into the inner loop doesn't feel right.
> 

Basically, since you want to release the GIL, you can grab and set an
error right now. That will work, although grabbing the GIL from the
inner loop is not ideal, at least in the sense that it does not work
with subinterpreters (but numpy does not currently work with those in
any case). We do use this internally, I believe.

Well, even without dtypes, I think we probably want a few extra API
around UFuncs, and that is setup/teardown (not necessarily as such
functions), as well as a return value for the inner loop to signal
iteration stop.

There was a long discussion about that, for example here:
https://github.com/numpy/numpy/issues/12518

There is another use-case, that we probably want to allow optimized
loop selection (necessary/used in casting)..

Note that I believe all of this type of logic should be moved into a
UFuncImpl [0] object, so that it can be DType (and especially user
DType) specific without bloating up the current UFunc object too much.
Although that puts a lot of power out there, so may be good to limit it
a lot iniyially

Best,

Sebastian

[0] It was Erics suggestion/name, I do not know if it came up earlier.

> 
> > I don't think you were proposing that core dimensions should
> > _never_ be
> > allowed to be 0,
> 
> No, I'm not suggesting that.  There are many cases where a length 0
> core dimension is fine.
> 
> I'm interested in the case where there is not a meaningful definition
> of the operation on the empty set.  The mean is an
> example.  Currently
> `np.mean([])` generates two warnings (one useful, the other cryptic
> and apparently incidental), and returns nan.  Returning nan is one
> way
> to handle such a case; another is to raise an error like
> `np.amax([])`
> does.  I'd like to raise an error in the example that I'm working on
> ('peaktopeak' at https://github.com/WarrenWeckesser/npuff).  The
> function is a gufunc, not a reduction of a binary operation, so the
> 'identity' argument  of PyUFunc_FromFuncAndDataAndSignature has no
> effect.
> 
> > but if you were I disagree. I spent a fair amount of work
> > enabling that for linalg because it provided some convenient base
> > cases.
> > 
> > We could go down the route of augmenting the gufuncs signature
> > syntax to
> > support requiring non-empty dimensions, like we did for optional
> > ones -
> > although IMO we should consider switching from a string
> > minilanguage to a
> > structured object specification if we plan to go too much further
> > with
> > extending it.
> 
> After only a quick glance at that code: one option is to add a '+'
> after the input names in the signature that must have a length that
> is
> at least 1.  So the signature for functions like `mean` (if you were
> to reimplement it as a gufunc, and wanted an error instead of nan),
> `amax`, `ptp`, etc, would be '(i+)->()'.
> 
> However, the only meaningful uses-cases of this enhancement that I've
> come up with are these simple reductions.  So I don't know if making
> such a change to the signature is worthwhile.  On the other hand,
> there are many examples of useful 1-d reductions that aren't the
> reduction of an associative binary operation.  It might be worthwhile
> to have a new convenience function just for the case '(i)->()', maybe
> something like PyUFunc_OneDReduction_FromFuncAndData (ugh, that's
> ugly, but I think you get the idea), and that function can have an
> argument to specify that the length must be at least 1.
> 
> I'll see if that is feasible, but I won't be surprised to learn that
> there are good reasons for *not* doing that.
> 
> Warren
> 
> 
> 
> > On Sat, Sep 28, 2019, 17:47 Warren Weckesser <
> > warren.weckesser at gmail.com>
> > wrote:
> > 
> > > I'm experimenting with gufuncs, and I just created a simple one
> > > with
> > > signature '(i)->()'.  Is there a way to configure the gufunc
> > > itself so
> > > that an empty array results in an error?  Or would I have to
> > > create a
> > > Python wrapper around the gufunc that does the error checking?
> > > Currently, when passed an empty array, the ufunc loop is called
> > > with
> > > the core dimension associated with i set to 0.  It would be nice
> > > if
> > > the code didn't get that far, and the ufunc machinery "knew" that
> > > this
> > > gufunc didn't accept a core dimension that is 0.  I'd like to
> > > automatically get an error, something like the error produced by
> > > `np.max([])`.
> > > 
> > > Warren
> > > _______________________________________________
> > > NumPy-Discussion mailing list
> > > NumPy-Discussion at python.org
> > > https://mail.python.org/mailman/listinfo/numpy-discussion
> > > 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190928/19944ad5/attachment.sig>