[Numpy-discussion] nanargmax failure case (was: Re: [SciPy-Dev] 1.8.0rc1)

josef.pktd at gmail.com josef.pktd at gmail.com
Tue Oct 1 12:19:14 EDT 2013


On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <njs at pobox.com> wrote:
> On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris
> <charlesr.harris at gmail.com> wrote:
>>
>>
>>
>> On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>>
>>> [switching subject to break out from the giant 1.8.0rc1 thread]
>>>
>>> On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris
>>> <charlesr.harris at gmail.com> wrote:
>>> >
>>> >
>>> >
>>> > On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>> >>
>>> >> On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris
>>> >> <charlesr.harris at gmail.com> wrote:
>>> >> > On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs at pobox.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke <cgohlke at uci.edu>
>>> >> >> wrote:
>>> >> >> > 2) Bottleneck 0.7.0
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701
>>> >> >>
>>> >> >> I can't tell if these are real bugs in numpy, or tests checking that
>>> >> >> bottleneck is bug-for-bug compatible with old numpy and we just
>>> >> >> fixed
>>> >> >> some bugs, or what. It's clearly something to do with the
>>> >> >> nanarg{max,min} rewrite -- @charris, do you know what's going on
>>> >> >> here?
>>> >> >>
>>> >> >
>>> >> > Yes ;) The previous behaviour of nanarg for all-nan axis was to cast
>>> >> > nan
>>> >> > to
>>> >> > intp when the result was an array, and return nan when a scalar. The
>>> >> > current
>>> >> > behaviour is to return the most negative value of intp as an error
>>> >> > marker in
>>> >> > both cases and raise a warning. It is a change in behavior, but I
>>> >> > think
>>> >> > one
>>> >> > that needs to be made.
>>> >>
>>> >> Ah, okay! I kind of lost track of the nanfunc changes by the end there.
>>> >>
>>> >> So for the bottleneck issue, it sounds like the problem is just that
>>> >> bottleneck is still emulating the old numpy behaviour in this corner
>>> >> case, which isn't really a problem. So we don't really need to worry
>>> >> about that, both behaviours are correct, just maybe out of sync.
>>> >>
>>> >> I'm a little dubious about this "make up some weird value that will
>>> >> *probably* blow up if people try to use it without checking, and also
>>> >> raise a warning" thing, wouldn't it make more sense to just raise an
>>> >> error? That's what exceptions are for? I guess I should have said
>>> >> something earlier though...
>>> >>
>>> >
>>> > I figure the blowup is safe, as we can't allocate arrays big enough that
>>> > the
>>> > minimum intp value would be a valid index. I considered raising an
>>> > error,
>>> > and if there is a consensus the behavior could be changed. Or we could
>>> > add a
>>> > keyword to determine the behavior.
>>>
>>> Yeah, the intp value can't be a valid index, so that covers 95% of
>>> cases, but I'm worried about that other 5%. It could still pass
>>> silently as the endpoint of a slice, or participate in some sort of
>>> integer arithmetic calculation, etc. I assume you also share this
>>> worry to some extent or you wouldn't have put in the warning ;-).
>>>
>>> I guess the bigger question is, why would we *not* use the standard
>>> method for signaling an exceptional condition here, i.e., exceptions?
>>> That way we're 100% guaranteed that if people aren't prepared to
>>> handle it then they'll at least know something has gone wrong, and if
>>> they are prepared to handle it then it's very easy and standard, just
>>> use try/except. Right now I guess you have to check for the special
>>> value, but also do something to silence warnings, but just for that
>>> one line? Sounds kind of complicated...
>>
>>
>> The main reason was for the case of multiple axis, where some of the results
>> would be valid and others not. The simple thing might be to raise an
>> exception but keep the current return values so that users could determine
>> where the problem occurred.
>
> Oh, duh, yes, right, now I remember this discussion. Sorry for being slow.
>
> In the past we've *always* raised in error in the multiple axis case,
> right? Has anyone ever complained? Wanting to get all
> nanargmax/nanargmin results, of which some might be errors, without
> just writing a loop, seems like a pretty exotic case to me, so I'm not
> sure we should optimize for it at the expense of returning
> possibly-misleading results in the scalar case.
>
> Like (I think) you say, we could get the best of both worlds by
> encoding the results in the same way we do right now, but then raise
> an exception and attach the results to the exception so they can be
> retrieved if wanted. Kind of cumbersome, but maybe good?
>
> This is a more general problem though of course -- we've run into it
> in the gufunc linalg code too, where there's some question about you
> do in e.g. chol() if some sub-matrices are positive-definite and some
> are not.
>
> Off the top of my head the general solution might be to define a
> MultiError exception type that has a standard generic format for
> describing such things. It'd need a mask saying which values were
> valid, rather than encoding them directly into the return values --
> otherwise we have the problem where nanargmax wants to use INT_MIN,
> chol wants to use NaN, and maybe the next function along doesn't have
> any usable flag value available at all. So probably more thought is
> needed before nailing down exactly how we handle such "partial" errors
> for vectorized functions.
>
> In the short term (i.e., 1.8.0), maybe we should defer this discussion
> by simply raising a regular ValueError for nanarg functions on all
> errors? That's not a regression from 1.7, since 1.7 also didn't
> provide any way to get at partial results in the event of an error,
> and it leaves us in a good position to solve the more general problem
> later.

Can we make the error optional in these cases?

like np.seterr for zerodivision, invalid, or floating point errors
that allows ignore and raise
np.seterr(linalg='ignore')

I don't know about nanarg, but thinking about some applications for
gufunc linalg code.

In some cases I might require for example invertibility of all
matrices and raise if one fails,
in other case I would be happy with nans, and just sum the results
with nansum for example or replace them by some fill value.

Josef

>
> -n
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion



More information about the NumPy-Discussion mailing list