nanargmax failure case (was: Re: [SciPy-Dev] 1.8.0rc1)

[switching subject to break out from the giant 1.8.0rc1 thread] On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke <cgohlke@uci.edu> wrote:
2) Bottleneck 0.7.0
https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701
I can't tell if these are real bugs in numpy, or tests checking that bottleneck is bug-for-bug compatible with old numpy and we just fixed some bugs, or what. It's clearly something to do with the nanarg{max,min} rewrite -- @charris, do you know what's going on here?
Yes ;) The previous behaviour of nanarg for all-nan axis was to cast nan to intp when the result was an array, and return nan when a scalar. The current behaviour is to return the most negative value of intp as an error marker in both cases and raise a warning. It is a change in behavior, but I think one that needs to be made.
Ah, okay! I kind of lost track of the nanfunc changes by the end there.
So for the bottleneck issue, it sounds like the problem is just that bottleneck is still emulating the old numpy behaviour in this corner case, which isn't really a problem. So we don't really need to worry about that, both behaviours are correct, just maybe out of sync.
I'm a little dubious about this "make up some weird value that will *probably* blow up if people try to use it without checking, and also raise a warning" thing, wouldn't it make more sense to just raise an error? That's what exceptions are for? I guess I should have said something earlier though...
I figure the blowup is safe, as we can't allocate arrays big enough that the minimum intp value would be a valid index. I considered raising an error, and if there is a consensus the behavior could be changed. Or we could add a keyword to determine the behavior.
Yeah, the intp value can't be a valid index, so that covers 95% of cases, but I'm worried about that other 5%. It could still pass silently as the endpoint of a slice, or participate in some sort of integer arithmetic calculation, etc. I assume you also share this worry to some extent or you wouldn't have put in the warning ;-). I guess the bigger question is, why would we *not* use the standard method for signaling an exceptional condition here, i.e., exceptions? That way we're 100% guaranteed that if people aren't prepared to handle it then they'll at least know something has gone wrong, and if they are prepared to handle it then it's very easy and standard, just use try/except. Right now I guess you have to check for the special value, but also do something to silence warnings, but just for that one line? Sounds kind of complicated... -n

On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs@pobox.com> wrote:
[switching subject to break out from the giant 1.8.0rc1 thread]
On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs@pobox.com>
On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke <cgohlke@uci.edu> wrote:
2) Bottleneck 0.7.0
https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701
I can't tell if these are real bugs in numpy, or tests checking that bottleneck is bug-for-bug compatible with old numpy and we just fixed some bugs, or what. It's clearly something to do with the nanarg{max,min} rewrite -- @charris, do you know what's going on
here?
Yes ;) The previous behaviour of nanarg for all-nan axis was to cast nan to intp when the result was an array, and return nan when a scalar. The current behaviour is to return the most negative value of intp as an error marker in both cases and raise a warning. It is a change in behavior, but I
one that needs to be made.
Ah, okay! I kind of lost track of the nanfunc changes by the end there.
So for the bottleneck issue, it sounds like the problem is just that bottleneck is still emulating the old numpy behaviour in this corner case, which isn't really a problem. So we don't really need to worry about that, both behaviours are correct, just maybe out of sync.
I'm a little dubious about this "make up some weird value that will *probably* blow up if people try to use it without checking, and also raise a warning" thing, wouldn't it make more sense to just raise an error? That's what exceptions are for? I guess I should have said something earlier though...
I figure the blowup is safe, as we can't allocate arrays big enough that
wrote: think the
minimum intp value would be a valid index. I considered raising an error, and if there is a consensus the behavior could be changed. Or we could add a keyword to determine the behavior.
Yeah, the intp value can't be a valid index, so that covers 95% of cases, but I'm worried about that other 5%. It could still pass silently as the endpoint of a slice, or participate in some sort of integer arithmetic calculation, etc. I assume you also share this worry to some extent or you wouldn't have put in the warning ;-).
I guess the bigger question is, why would we *not* use the standard method for signaling an exceptional condition here, i.e., exceptions? That way we're 100% guaranteed that if people aren't prepared to handle it then they'll at least know something has gone wrong, and if they are prepared to handle it then it's very easy and standard, just use try/except. Right now I guess you have to check for the special value, but also do something to silence warnings, but just for that one line? Sounds kind of complicated...
The main reason was for the case of multiple axis, where some of the results would be valid and others not. The simple thing might be to raise an exception but keep the current return values so that users could determine where the problem occurred. Chuck

On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs@pobox.com> wrote:
[switching subject to break out from the giant 1.8.0rc1 thread]
On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke <cgohlke@uci.edu> wrote: > 2) Bottleneck 0.7.0 > > > > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701
I can't tell if these are real bugs in numpy, or tests checking that bottleneck is bug-for-bug compatible with old numpy and we just fixed some bugs, or what. It's clearly something to do with the nanarg{max,min} rewrite -- @charris, do you know what's going on here?
Yes ;) The previous behaviour of nanarg for all-nan axis was to cast nan to intp when the result was an array, and return nan when a scalar. The current behaviour is to return the most negative value of intp as an error marker in both cases and raise a warning. It is a change in behavior, but I think one that needs to be made.
Ah, okay! I kind of lost track of the nanfunc changes by the end there.
So for the bottleneck issue, it sounds like the problem is just that bottleneck is still emulating the old numpy behaviour in this corner case, which isn't really a problem. So we don't really need to worry about that, both behaviours are correct, just maybe out of sync.
I'm a little dubious about this "make up some weird value that will *probably* blow up if people try to use it without checking, and also raise a warning" thing, wouldn't it make more sense to just raise an error? That's what exceptions are for? I guess I should have said something earlier though...
I figure the blowup is safe, as we can't allocate arrays big enough that the minimum intp value would be a valid index. I considered raising an error, and if there is a consensus the behavior could be changed. Or we could add a keyword to determine the behavior.
Yeah, the intp value can't be a valid index, so that covers 95% of cases, but I'm worried about that other 5%. It could still pass silently as the endpoint of a slice, or participate in some sort of integer arithmetic calculation, etc. I assume you also share this worry to some extent or you wouldn't have put in the warning ;-).
I guess the bigger question is, why would we *not* use the standard method for signaling an exceptional condition here, i.e., exceptions? That way we're 100% guaranteed that if people aren't prepared to handle it then they'll at least know something has gone wrong, and if they are prepared to handle it then it's very easy and standard, just use try/except. Right now I guess you have to check for the special value, but also do something to silence warnings, but just for that one line? Sounds kind of complicated...
The main reason was for the case of multiple axis, where some of the results would be valid and others not. The simple thing might be to raise an exception but keep the current return values so that users could determine where the problem occurred.
Oh, duh, yes, right, now I remember this discussion. Sorry for being slow. In the past we've *always* raised in error in the multiple axis case, right? Has anyone ever complained? Wanting to get all nanargmax/nanargmin results, of which some might be errors, without just writing a loop, seems like a pretty exotic case to me, so I'm not sure we should optimize for it at the expense of returning possibly-misleading results in the scalar case. Like (I think) you say, we could get the best of both worlds by encoding the results in the same way we do right now, but then raise an exception and attach the results to the exception so they can be retrieved if wanted. Kind of cumbersome, but maybe good? This is a more general problem though of course -- we've run into it in the gufunc linalg code too, where there's some question about you do in e.g. chol() if some sub-matrices are positive-definite and some are not. Off the top of my head the general solution might be to define a MultiError exception type that has a standard generic format for describing such things. It'd need a mask saying which values were valid, rather than encoding them directly into the return values -- otherwise we have the problem where nanargmax wants to use INT_MIN, chol wants to use NaN, and maybe the next function along doesn't have any usable flag value available at all. So probably more thought is needed before nailing down exactly how we handle such "partial" errors for vectorized functions. In the short term (i.e., 1.8.0), maybe we should defer this discussion by simply raising a regular ValueError for nanarg functions on all errors? That's not a regression from 1.7, since 1.7 also didn't provide any way to get at partial results in the event of an error, and it leaves us in a good position to solve the more general problem later. -n

On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs@pobox.com> wrote:
[switching subject to break out from the giant 1.8.0rc1 thread]
On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs@pobox.com> wrote: > > On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke <cgohlke@uci.edu> > wrote: > > 2) Bottleneck 0.7.0 > > > > > > > > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701 > > I can't tell if these are real bugs in numpy, or tests checking that > bottleneck is bug-for-bug compatible with old numpy and we just > fixed > some bugs, or what. It's clearly something to do with the > nanarg{max,min} rewrite -- @charris, do you know what's going on > here? >
Yes ;) The previous behaviour of nanarg for all-nan axis was to cast nan to intp when the result was an array, and return nan when a scalar. The current behaviour is to return the most negative value of intp as an error marker in both cases and raise a warning. It is a change in behavior, but I think one that needs to be made.
Ah, okay! I kind of lost track of the nanfunc changes by the end there.
So for the bottleneck issue, it sounds like the problem is just that bottleneck is still emulating the old numpy behaviour in this corner case, which isn't really a problem. So we don't really need to worry about that, both behaviours are correct, just maybe out of sync.
I'm a little dubious about this "make up some weird value that will *probably* blow up if people try to use it without checking, and also raise a warning" thing, wouldn't it make more sense to just raise an error? That's what exceptions are for? I guess I should have said something earlier though...
I figure the blowup is safe, as we can't allocate arrays big enough that the minimum intp value would be a valid index. I considered raising an error, and if there is a consensus the behavior could be changed. Or we could add a keyword to determine the behavior.
Yeah, the intp value can't be a valid index, so that covers 95% of cases, but I'm worried about that other 5%. It could still pass silently as the endpoint of a slice, or participate in some sort of integer arithmetic calculation, etc. I assume you also share this worry to some extent or you wouldn't have put in the warning ;-).
I guess the bigger question is, why would we *not* use the standard method for signaling an exceptional condition here, i.e., exceptions? That way we're 100% guaranteed that if people aren't prepared to handle it then they'll at least know something has gone wrong, and if they are prepared to handle it then it's very easy and standard, just use try/except. Right now I guess you have to check for the special value, but also do something to silence warnings, but just for that one line? Sounds kind of complicated...
The main reason was for the case of multiple axis, where some of the results would be valid and others not. The simple thing might be to raise an exception but keep the current return values so that users could determine where the problem occurred.
Oh, duh, yes, right, now I remember this discussion. Sorry for being slow.
In the past we've *always* raised in error in the multiple axis case, right? Has anyone ever complained? Wanting to get all nanargmax/nanargmin results, of which some might be errors, without just writing a loop, seems like a pretty exotic case to me, so I'm not sure we should optimize for it at the expense of returning possibly-misleading results in the scalar case.
Like (I think) you say, we could get the best of both worlds by encoding the results in the same way we do right now, but then raise an exception and attach the results to the exception so they can be retrieved if wanted. Kind of cumbersome, but maybe good?
This is a more general problem though of course -- we've run into it in the gufunc linalg code too, where there's some question about you do in e.g. chol() if some sub-matrices are positive-definite and some are not.
Off the top of my head the general solution might be to define a MultiError exception type that has a standard generic format for describing such things. It'd need a mask saying which values were valid, rather than encoding them directly into the return values -- otherwise we have the problem where nanargmax wants to use INT_MIN, chol wants to use NaN, and maybe the next function along doesn't have any usable flag value available at all. So probably more thought is needed before nailing down exactly how we handle such "partial" errors for vectorized functions.
In the short term (i.e., 1.8.0), maybe we should defer this discussion by simply raising a regular ValueError for nanarg functions on all errors? That's not a regression from 1.7, since 1.7 also didn't provide any way to get at partial results in the event of an error, and it leaves us in a good position to solve the more general problem later.
Can we make the error optional in these cases? like np.seterr for zerodivision, invalid, or floating point errors that allows ignore and raise np.seterr(linalg='ignore') I don't know about nanarg, but thinking about some applications for gufunc linalg code. In some cases I might require for example invertibility of all matrices and raise if one fails, in other case I would be happy with nans, and just sum the results with nansum for example or replace them by some fill value. Josef
-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, Oct 1, 2013 at 10:19 AM, <josef.pktd@gmail.com> wrote:
On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs@pobox.com> wrote:
[switching subject to break out from the giant 1.8.0rc1 thread]
On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs@pobox.com>
wrote:
On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs@pobox.com> > wrote: >> >> On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke <
cgohlke@uci.edu>
>> wrote: >> > 2) Bottleneck 0.7.0 >> > >> > >> > >> > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701 >> >> I can't tell if these are real bugs in numpy, or tests checking
>> bottleneck is bug-for-bug compatible with old numpy and we just >> fixed >> some bugs, or what. It's clearly something to do with the >> nanarg{max,min} rewrite -- @charris, do you know what's going on >> here? >> > > Yes ;) The previous behaviour of nanarg for all-nan axis was to cast > nan > to > intp when the result was an array, and return nan when a scalar. The > current > behaviour is to return the most negative value of intp as an error > marker in > both cases and raise a warning. It is a change in behavior, but I > think > one > that needs to be made.
Ah, okay! I kind of lost track of the nanfunc changes by the end
So for the bottleneck issue, it sounds like the problem is just that bottleneck is still emulating the old numpy behaviour in this corner case, which isn't really a problem. So we don't really need to worry about that, both behaviours are correct, just maybe out of sync.
I'm a little dubious about this "make up some weird value that will *probably* blow up if people try to use it without checking, and
also
raise a warning" thing, wouldn't it make more sense to just raise an error? That's what exceptions are for? I guess I should have said something earlier though...
I figure the blowup is safe, as we can't allocate arrays big enough
On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <njs@pobox.com> wrote: that there. that
the minimum intp value would be a valid index. I considered raising an error, and if there is a consensus the behavior could be changed. Or we could add a keyword to determine the behavior.
Yeah, the intp value can't be a valid index, so that covers 95% of cases, but I'm worried about that other 5%. It could still pass silently as the endpoint of a slice, or participate in some sort of integer arithmetic calculation, etc. I assume you also share this worry to some extent or you wouldn't have put in the warning ;-).
I guess the bigger question is, why would we *not* use the standard method for signaling an exceptional condition here, i.e., exceptions? That way we're 100% guaranteed that if people aren't prepared to handle it then they'll at least know something has gone wrong, and if they are prepared to handle it then it's very easy and standard, just use try/except. Right now I guess you have to check for the special value, but also do something to silence warnings, but just for that one line? Sounds kind of complicated...
The main reason was for the case of multiple axis, where some of the results would be valid and others not. The simple thing might be to raise an exception but keep the current return values so that users could determine where the problem occurred.
Oh, duh, yes, right, now I remember this discussion. Sorry for being slow.
In the past we've *always* raised in error in the multiple axis case, right? Has anyone ever complained? Wanting to get all nanargmax/nanargmin results, of which some might be errors, without just writing a loop, seems like a pretty exotic case to me, so I'm not sure we should optimize for it at the expense of returning possibly-misleading results in the scalar case.
Like (I think) you say, we could get the best of both worlds by encoding the results in the same way we do right now, but then raise an exception and attach the results to the exception so they can be retrieved if wanted. Kind of cumbersome, but maybe good?
This is a more general problem though of course -- we've run into it in the gufunc linalg code too, where there's some question about you do in e.g. chol() if some sub-matrices are positive-definite and some are not.
Off the top of my head the general solution might be to define a MultiError exception type that has a standard generic format for describing such things. It'd need a mask saying which values were valid, rather than encoding them directly into the return values -- otherwise we have the problem where nanargmax wants to use INT_MIN, chol wants to use NaN, and maybe the next function along doesn't have any usable flag value available at all. So probably more thought is needed before nailing down exactly how we handle such "partial" errors for vectorized functions.
In the short term (i.e., 1.8.0), maybe we should defer this discussion by simply raising a regular ValueError for nanarg functions on all errors? That's not a regression from 1.7, since 1.7 also didn't provide any way to get at partial results in the event of an error, and it leaves us in a good position to solve the more general problem later.
Can we make the error optional in these cases?
like np.seterr for zerodivision, invalid, or floating point errors that allows ignore and raise np.seterr(linalg='ignore')
I don't know about nanarg, but thinking about some applications for gufunc linalg code.
In some cases I might require for example invertibility of all matrices and raise if one fails, in other case I would be happy with nans, and just sum the results with nansum for example or replace them by some fill value.
I'm thinking warnings might be more flexible than exceptions:
with warnings.catch_warnings(): warnings.simplefilter('error') ... Chuck

On 1 Oct 2013 17:34, "Charles R Harris" <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 10:19 AM, <josef.pktd@gmail.com> wrote:
On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs@pobox.com> wrote:
[switching subject to break out from the giant 1.8.0rc1 thread]
On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs@pobox.com>
> > On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris > <charlesr.harris@gmail.com> wrote: > > On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs@pobox.com> > > wrote: > >> > >> On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke < cgohlke@uci.edu> > >> wrote: > >> > 2) Bottleneck 0.7.0 > >> > > >> > > >> > > >> > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701 > >> > >> I can't tell if these are real bugs in numpy, or tests checking that > >> bottleneck is bug-for-bug compatible with old numpy and we just > >> fixed > >> some bugs, or what. It's clearly something to do with the > >> nanarg{max,min} rewrite -- @charris, do you know what's going on > >> here? > >> > > > > Yes ;) The previous behaviour of nanarg for all-nan axis was to cast > > nan > > to > > intp when the result was an array, and return nan when a scalar. The > > current > > behaviour is to return the most negative value of intp as an error > > marker in > > both cases and raise a warning. It is a change in behavior, but I > > think > > one > > that needs to be made. > > Ah, okay! I kind of lost track of the nanfunc changes by the end
> > So for the bottleneck issue, it sounds like the problem is just
wrote: there. that
> bottleneck is still emulating the old numpy behaviour in this corner > case, which isn't really a problem. So we don't really need to worry > about that, both behaviours are correct, just maybe out of sync. > > I'm a little dubious about this "make up some weird value that will > *probably* blow up if people try to use it without checking, and also > raise a warning" thing, wouldn't it make more sense to just raise an > error? That's what exceptions are for? I guess I should have said > something earlier though... >
I figure the blowup is safe, as we can't allocate arrays big enough that the minimum intp value would be a valid index. I considered raising an error, and if there is a consensus the behavior could be changed. Or we could add a keyword to determine the behavior.
Yeah, the intp value can't be a valid index, so that covers 95% of cases, but I'm worried about that other 5%. It could still pass silently as the endpoint of a slice, or participate in some sort of integer arithmetic calculation, etc. I assume you also share this worry to some extent or you wouldn't have put in the warning ;-).
I guess the bigger question is, why would we *not* use the standard method for signaling an exceptional condition here, i.e., exceptions? That way we're 100% guaranteed that if people aren't prepared to handle it then they'll at least know something has gone wrong, and if they are prepared to handle it then it's very easy and standard, just use try/except. Right now I guess you have to check for the special value, but also do something to silence warnings, but just for that one line? Sounds kind of complicated...
The main reason was for the case of multiple axis, where some of the results would be valid and others not. The simple thing might be to raise an exception but keep the current return values so that users could determine where the problem occurred.
Oh, duh, yes, right, now I remember this discussion. Sorry for being slow.
In the past we've *always* raised in error in the multiple axis case, right? Has anyone ever complained? Wanting to get all nanargmax/nanargmin results, of which some might be errors, without just writing a loop, seems like a pretty exotic case to me, so I'm not sure we should optimize for it at the expense of returning possibly-misleading results in the scalar case.
Like (I think) you say, we could get the best of both worlds by encoding the results in the same way we do right now, but then raise an exception and attach the results to the exception so they can be retrieved if wanted. Kind of cumbersome, but maybe good?
This is a more general problem though of course -- we've run into it in the gufunc linalg code too, where there's some question about you do in e.g. chol() if some sub-matrices are positive-definite and some are not.
Off the top of my head the general solution might be to define a MultiError exception type that has a standard generic format for describing such things. It'd need a mask saying which values were valid, rather than encoding them directly into the return values -- otherwise we have the problem where nanargmax wants to use INT_MIN, chol wants to use NaN, and maybe the next function along doesn't have any usable flag value available at all. So probably more thought is needed before nailing down exactly how we handle such "partial" errors for vectorized functions.
In the short term (i.e., 1.8.0), maybe we should defer this discussion by simply raising a regular ValueError for nanarg functions on all errors? That's not a regression from 1.7, since 1.7 also didn't provide any way to get at partial results in the event of an error, and it leaves us in a good position to solve the more general problem later.
Can we make the error optional in these cases?
like np.seterr for zerodivision, invalid, or floating point errors that allows ignore and raise np.seterr(linalg='ignore')
I don't know about nanarg, but thinking about some applications for gufunc linalg code.
In some cases I might require for example invertibility of all matrices and raise if one fails, in other case I would be happy with nans, and just sum the results with nansum for example or replace them by some fill value.
I'm thinking warnings might be more flexible than exceptions:
with warnings.catch_warnings(): warnings.simplefilter('error') ...
Sure. Passing in a callback or just leaving the function out and telling people to implement it themselves would be even more flexible :-). But we have to trade off complexity of usage, complexity of teaching people how to do stuff (nobody knows how to use catch_warnings, we only know because we started writing warning tests just in the last year or so), usefulness in common situations, etc. The warnings api doesn't give you any way to pass results out, you still need a separate channel to say what failed and what succeeded (and maybe for the failures to say what the different failures are). Anyway this back and forth still supprts my main suggestion for *right* now, which is that this is sufficiently nonobvious that with 1.8 breathing down our necks we should start with the safe behaviour and then work up from there. -n

On Tue, Oct 1, 2013 at 4:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 1 Oct 2013 17:34, "Charles R Harris" <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 10:19 AM, <josef.pktd@gmail.com> wrote:
On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs@pobox.com> wrote:
[switching subject to break out from the giant 1.8.0rc1 thread]
On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > > > On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs@pobox.com> > wrote: >> >> On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris >> <charlesr.harris@gmail.com> wrote: >> > On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs@pobox.com> >> > wrote: >> >> >> >> On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke >> >> <cgohlke@uci.edu> >> >> wrote: >> >> > 2) Bottleneck 0.7.0 >> >> > >> >> > >> >> > >> >> > >> >> > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701 >> >> >> >> I can't tell if these are real bugs in numpy, or tests checking >> >> that >> >> bottleneck is bug-for-bug compatible with old numpy and we just >> >> fixed >> >> some bugs, or what. It's clearly something to do with the >> >> nanarg{max,min} rewrite -- @charris, do you know what's going >> >> on >> >> here? >> >> >> > >> > Yes ;) The previous behaviour of nanarg for all-nan axis was to >> > cast >> > nan >> > to >> > intp when the result was an array, and return nan when a scalar. >> > The >> > current >> > behaviour is to return the most negative value of intp as an >> > error >> > marker in >> > both cases and raise a warning. It is a change in behavior, but >> > I >> > think >> > one >> > that needs to be made. >> >> Ah, okay! I kind of lost track of the nanfunc changes by the end >> there. >> >> So for the bottleneck issue, it sounds like the problem is just >> that >> bottleneck is still emulating the old numpy behaviour in this >> corner >> case, which isn't really a problem. So we don't really need to >> worry >> about that, both behaviours are correct, just maybe out of sync. >> >> I'm a little dubious about this "make up some weird value that >> will >> *probably* blow up if people try to use it without checking, and >> also >> raise a warning" thing, wouldn't it make more sense to just raise >> an >> error? That's what exceptions are for? I guess I should have said >> something earlier though... >> > > I figure the blowup is safe, as we can't allocate arrays big enough > that > the > minimum intp value would be a valid index. I considered raising an > error, > and if there is a consensus the behavior could be changed. Or we > could > add a > keyword to determine the behavior.
Yeah, the intp value can't be a valid index, so that covers 95% of cases, but I'm worried about that other 5%. It could still pass silently as the endpoint of a slice, or participate in some sort of integer arithmetic calculation, etc. I assume you also share this worry to some extent or you wouldn't have put in the warning ;-).
I guess the bigger question is, why would we *not* use the standard method for signaling an exceptional condition here, i.e., exceptions? That way we're 100% guaranteed that if people aren't prepared to handle it then they'll at least know something has gone wrong, and if they are prepared to handle it then it's very easy and standard, just use try/except. Right now I guess you have to check for the special value, but also do something to silence warnings, but just for that one line? Sounds kind of complicated...
The main reason was for the case of multiple axis, where some of the results would be valid and others not. The simple thing might be to raise an exception but keep the current return values so that users could determine where the problem occurred.
Oh, duh, yes, right, now I remember this discussion. Sorry for being slow.
In the past we've *always* raised in error in the multiple axis case, right? Has anyone ever complained? Wanting to get all nanargmax/nanargmin results, of which some might be errors, without just writing a loop, seems like a pretty exotic case to me, so I'm not sure we should optimize for it at the expense of returning possibly-misleading results in the scalar case.
Like (I think) you say, we could get the best of both worlds by encoding the results in the same way we do right now, but then raise an exception and attach the results to the exception so they can be retrieved if wanted. Kind of cumbersome, but maybe good?
This is a more general problem though of course -- we've run into it in the gufunc linalg code too, where there's some question about you do in e.g. chol() if some sub-matrices are positive-definite and some are not.
Off the top of my head the general solution might be to define a MultiError exception type that has a standard generic format for describing such things. It'd need a mask saying which values were valid, rather than encoding them directly into the return values -- otherwise we have the problem where nanargmax wants to use INT_MIN, chol wants to use NaN, and maybe the next function along doesn't have any usable flag value available at all. So probably more thought is needed before nailing down exactly how we handle such "partial" errors for vectorized functions.
In the short term (i.e., 1.8.0), maybe we should defer this discussion by simply raising a regular ValueError for nanarg functions on all errors? That's not a regression from 1.7, since 1.7 also didn't provide any way to get at partial results in the event of an error, and it leaves us in a good position to solve the more general problem later.
Can we make the error optional in these cases?
like np.seterr for zerodivision, invalid, or floating point errors that allows ignore and raise np.seterr(linalg='ignore')
I don't know about nanarg, but thinking about some applications for gufunc linalg code.
In some cases I might require for example invertibility of all matrices and raise if one fails, in other case I would be happy with nans, and just sum the results with nansum for example or replace them by some fill value.
I'm thinking warnings might be more flexible than exceptions:
with warnings.catch_warnings(): warnings.simplefilter('error') ...
Sure. Passing in a callback or just leaving the function out and telling people to implement it themselves would be even more flexible :-). But we have to trade off complexity of usage, complexity of teaching people how to do stuff (nobody knows how to use catch_warnings, we only know because we started writing warning tests just in the last year or so), usefulness in common situations, etc. The warnings api doesn't give you any way to pass results out, you still need a separate channel to say what failed and what succeeded (and maybe for the failures to say what the different failures are).
Since numpy and scipy just moved to python 2.6, it's time to advertise and support warnings.catch_warnings(). If you want to wait for a "missing value support" in numpy to support this, then this postpones this to .... (numpy 3.0?) while gufuncs seem to be happening now. Josef "from the balcony" 3-dimensional panel data linear algebra without vec and kron ?
Anyway this back and forth still supprts my main suggestion for *right* now, which is that this is sufficiently nonobvious that with 1.8 breathing down our necks we should start with the safe behaviour and then work up from there.
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Tue, Oct 1, 2013 at 9:37 PM, <josef.pktd@gmail.com> wrote:
Since numpy and scipy just moved to python 2.6, it's time to advertise and support warnings.catch_warnings().
warnings.catch_warnings is a very useful tool and this is all fun to talk about, but realistically we're simply not going to merge any change which involves telling people "the way you detect failure in this function is to use the catch_warnings() context manager".
If you want to wait for a "missing value support" in numpy to support this, then this postpones this to .... (numpy 3.0?) while gufuncs seem to be happening now.
No-one said anything about missing value support :-). I don't see how it would really solve the problem -- we'll probably never allow missing values to magically appear in arbitrary function outputs (e.g. you can't put a bitpattern NA in a regular integer dtype, it's just not possible). On Tue, Oct 1, 2013 at 9:37 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
I'm surely not opposed to raising an exception if there is agreement on that. I think it would also be pretty easy to attach the result to the exception. For the latter it would be good to have an exception type that we could maybe reuse for other parts of Numpy.
It is true that no-one's ever objected to the ValueError that nanarg{max,min} have raised in the past, right? That seems like one measure of agreement that it's at least acceptable. Spitball of a proper solution, though some thought would need to go into how to get this out of gufuncs. (And absolutely unsuitable for 1.8!): class VectorizedError(object): pass def vectorized_raise(result, good_mask, exceptions): exc_types = set([type(e) for e in exceptions]) exc_types.add(VectorizedError) exc_type = type.__new__(type, "SubVectorizedError", tuple(exc_types), {}) exc = exc_type("

whoops, fat-fingered this out while in the middle of typing it On Tue, Oct 1, 2013 at 9:55 PM, Nathaniel Smith <njs@pobox.com> wrote:
Spitball of a proper solution, though some thought would need to go into how to get this out of gufuncs. (And absolutely unsuitable for 1.8!):
class VectorizedError(object): pass
def vectorized_raise(result, good_mask, exceptions): exc_types = set([type(e) for e in exceptions]) exc_types.add(VectorizedError) exc_type = type.__new__(type, "SubVectorizedError", tuple(exc_types), {}) exc = exc_type("
# FIXME: use some heuristics to look at the exception messages and try to # say something more useful here exc = exc_type("Multiple errors") exc.result = result exc.good_mask = good_mask exc.exceptions = exceptions raise exc Okay, maybe that's a little ridiculous, but, you know. Discuss :-). (The black magic at the beginning is to ensure that if you have a ValueError and a TypeError consolidated into a single VectorizedError, then the exception that is raised can be caught by code that's looking for a ValueError, a TypeError, *or* a VectorizedError.) -n

On Tue, Oct 1, 2013 at 2:13 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 1 Oct 2013 17:34, "Charles R Harris" <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 10:19 AM, <josef.pktd@gmail.com> wrote:
On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris <charlesr.harris@gmail.com> wrote:
On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs@pobox.com>
[switching subject to break out from the giant 1.8.0rc1 thread]
On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris <charlesr.harris@gmail.com> wrote: > > > > On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs@pobox.com>
wrote:
>> >> On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris >> <charlesr.harris@gmail.com> wrote: >> > On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs@pobox.com
>> > wrote: >> >> >> >> On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke < cgohlke@uci.edu> >> >> wrote: >> >> > 2) Bottleneck 0.7.0 >> >> > >> >> > >> >> > >> >> > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701 >> >> >> >> I can't tell if these are real bugs in numpy, or tests checking that >> >> bottleneck is bug-for-bug compatible with old numpy and we just >> >> fixed >> >> some bugs, or what. It's clearly something to do with the >> >> nanarg{max,min} rewrite -- @charris, do you know what's going on >> >> here? >> >> >> > >> > Yes ;) The previous behaviour of nanarg for all-nan axis was to cast >> > nan >> > to >> > intp when the result was an array, and return nan when a scalar. The >> > current >> > behaviour is to return the most negative value of intp as an error >> > marker in >> > both cases and raise a warning. It is a change in behavior, but I >> > think >> > one >> > that needs to be made. >> >> Ah, okay! I kind of lost track of the nanfunc changes by the end
>> >> So for the bottleneck issue, it sounds like the problem is just
wrote: there. that
>> bottleneck is still emulating the old numpy behaviour in this corner >> case, which isn't really a problem. So we don't really need to worry >> about that, both behaviours are correct, just maybe out of sync. >> >> I'm a little dubious about this "make up some weird value that will >> *probably* blow up if people try to use it without checking, and also >> raise a warning" thing, wouldn't it make more sense to just raise an >> error? That's what exceptions are for? I guess I should have said >> something earlier though... >> > > I figure the blowup is safe, as we can't allocate arrays big enough that > the > minimum intp value would be a valid index. I considered raising an > error, > and if there is a consensus the behavior could be changed. Or we could > add a > keyword to determine the behavior.
Yeah, the intp value can't be a valid index, so that covers 95% of cases, but I'm worried about that other 5%. It could still pass silently as the endpoint of a slice, or participate in some sort of integer arithmetic calculation, etc. I assume you also share this worry to some extent or you wouldn't have put in the warning ;-).
I guess the bigger question is, why would we *not* use the standard method for signaling an exceptional condition here, i.e., exceptions? That way we're 100% guaranteed that if people aren't prepared to handle it then they'll at least know something has gone wrong, and if they are prepared to handle it then it's very easy and standard, just use try/except. Right now I guess you have to check for the special value, but also do something to silence warnings, but just for that one line? Sounds kind of complicated...
The main reason was for the case of multiple axis, where some of the results would be valid and others not. The simple thing might be to raise an exception but keep the current return values so that users could determine where the problem occurred.
Oh, duh, yes, right, now I remember this discussion. Sorry for being slow.
In the past we've *always* raised in error in the multiple axis case, right? Has anyone ever complained? Wanting to get all nanargmax/nanargmin results, of which some might be errors, without just writing a loop, seems like a pretty exotic case to me, so I'm not sure we should optimize for it at the expense of returning possibly-misleading results in the scalar case.
Like (I think) you say, we could get the best of both worlds by encoding the results in the same way we do right now, but then raise an exception and attach the results to the exception so they can be retrieved if wanted. Kind of cumbersome, but maybe good?
This is a more general problem though of course -- we've run into it in the gufunc linalg code too, where there's some question about you do in e.g. chol() if some sub-matrices are positive-definite and some are not.
Off the top of my head the general solution might be to define a MultiError exception type that has a standard generic format for describing such things. It'd need a mask saying which values were valid, rather than encoding them directly into the return values -- otherwise we have the problem where nanargmax wants to use INT_MIN, chol wants to use NaN, and maybe the next function along doesn't have any usable flag value available at all. So probably more thought is needed before nailing down exactly how we handle such "partial" errors for vectorized functions.
In the short term (i.e., 1.8.0), maybe we should defer this discussion by simply raising a regular ValueError for nanarg functions on all errors? That's not a regression from 1.7, since 1.7 also didn't provide any way to get at partial results in the event of an error, and it leaves us in a good position to solve the more general problem later.
Can we make the error optional in these cases?
like np.seterr for zerodivision, invalid, or floating point errors that allows ignore and raise np.seterr(linalg='ignore')
I don't know about nanarg, but thinking about some applications for gufunc linalg code.
In some cases I might require for example invertibility of all matrices and raise if one fails, in other case I would be happy with nans, and just sum the results with nansum for example or replace them by some fill value.
I'm thinking warnings might be more flexible than exceptions:
with warnings.catch_warnings(): warnings.simplefilter('error') ...
Sure. Passing in a callback or just leaving the function out and telling people to implement it themselves would be even more flexible :-). But we have to trade off complexity of usage, complexity of teaching people how to do stuff (nobody knows how to use catch_warnings, we only know because we started writing warning tests just in the last year or so), usefulness in common situations, etc. The warnings api doesn't give you any way to pass results out, you still need a separate channel to say what failed and what succeeded (and maybe for the failures to say what the different failures are).
Anyway this back and forth still supprts my main suggestion for *right* now, which is that this is sufficiently nonobvious that with 1.8 breathing down our necks we should start with the safe behaviour and then work up from there.
I'm surely not opposed to raising an exception if there is agreement on that. I think it would also be pretty easy to attach the result to the exception. For the latter it would be good to have an exception type that we could maybe reuse for other parts of Numpy. Chuck
participants (3)
-
Charles R Harris
-
josef.pktd@gmail.com
-
Nathaniel Smith