[Numpy-discussion] nanargmax failure case (was: Re: [SciPy-Dev] 1.8.0rc1)

Tue Oct 1 16:37:34 EDT 2013

On Tue, Oct 1, 2013 at 4:13 PM, Nathaniel Smith <njs at pobox.com> wrote:
> On 1 Oct 2013 17:34, "Charles R Harris" <charlesr.harris at gmail.com> wrote:
>>
>>
>>
>>
>> On Tue, Oct 1, 2013 at 10:19 AM, <josef.pktd at gmail.com> wrote:
>>>
>>> On Tue, Oct 1, 2013 at 10:47 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>> > On Tue, Oct 1, 2013 at 3:20 PM, Charles R Harris
>>> > <charlesr.harris at gmail.com> wrote:
>>> >>
>>> >>
>>> >>
>>> >> On Tue, Oct 1, 2013 at 8:12 AM, Nathaniel Smith <njs at pobox.com> wrote:
>>> >>>
>>> >>> [switching subject to break out from the giant 1.8.0rc1 thread]
>>> >>>
>>> >>> On Tue, Oct 1, 2013 at 2:52 PM, Charles R Harris
>>> >>> <charlesr.harris at gmail.com> wrote:
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > On Tue, Oct 1, 2013 at 7:25 AM, Nathaniel Smith <njs at pobox.com>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> On Tue, Oct 1, 2013 at 1:56 PM, Charles R Harris
>>> >>> >> <charlesr.harris at gmail.com> wrote:
>>> >>> >> > On Tue, Oct 1, 2013 at 4:43 AM, Nathaniel Smith <njs at pobox.com>
>>> >>> >> > wrote:
>>> >>> >> >>
>>> >>> >> >> On Mon, Sep 30, 2013 at 10:51 PM, Christoph Gohlke
>>> >>> >> >> <cgohlke at uci.edu>
>>> >>> >> >> wrote:
>>> >>> >> >> > 2) Bottleneck 0.7.0
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> >
>>> >>> >> >> > https://github.com/kwgoodman/bottleneck/issues/71#issuecomment-25331701
>>> >>> >> >>
>>> >>> >> >> I can't tell if these are real bugs in numpy, or tests checking
>>> >>> >> >> that
>>> >>> >> >> bottleneck is bug-for-bug compatible with old numpy and we just
>>> >>> >> >> fixed
>>> >>> >> >> some bugs, or what. It's clearly something to do with the
>>> >>> >> >> nanarg{max,min} rewrite -- @charris, do you know what's going
>>> >>> >> >> on
>>> >>> >> >> here?
>>> >>> >> >>
>>> >>> >> >
>>> >>> >> > Yes ;) The previous behaviour of nanarg for all-nan axis was to
>>> >>> >> > cast
>>> >>> >> > nan
>>> >>> >> > to
>>> >>> >> > intp when the result was an array, and return nan when a scalar.
>>> >>> >> > The
>>> >>> >> > current
>>> >>> >> > behaviour is to return the most negative value of intp as an
>>> >>> >> > error
>>> >>> >> > marker in
>>> >>> >> > both cases and raise a warning. It is a change in behavior, but
>>> >>> >> > I
>>> >>> >> > think
>>> >>> >> > one
>>> >>> >> > that needs to be made.
>>> >>> >>
>>> >>> >> Ah, okay! I kind of lost track of the nanfunc changes by the end
>>> >>> >> there.
>>> >>> >>
>>> >>> >> So for the bottleneck issue, it sounds like the problem is just
>>> >>> >> that
>>> >>> >> bottleneck is still emulating the old numpy behaviour in this
>>> >>> >> corner
>>> >>> >> case, which isn't really a problem. So we don't really need to
>>> >>> >> worry
>>> >>> >> about that, both behaviours are correct, just maybe out of sync.
>>> >>> >>
>>> >>> >> I'm a little dubious about this "make up some weird value that
>>> >>> >> will
>>> >>> >> *probably* blow up if people try to use it without checking, and
>>> >>> >> also
>>> >>> >> raise a warning" thing, wouldn't it make more sense to just raise
>>> >>> >> an
>>> >>> >> error? That's what exceptions are for? I guess I should have said
>>> >>> >> something earlier though...
>>> >>> >>
>>> >>> >
>>> >>> > I figure the blowup is safe, as we can't allocate arrays big enough
>>> >>> > that
>>> >>> > the
>>> >>> > minimum intp value would be a valid index. I considered raising an
>>> >>> > error,
>>> >>> > and if there is a consensus the behavior could be changed. Or we
>>> >>> > could
>>> >>> > add a
>>> >>> > keyword to determine the behavior.
>>> >>>
>>> >>> Yeah, the intp value can't be a valid index, so that covers 95% of
>>> >>> cases, but I'm worried about that other 5%. It could still pass
>>> >>> silently as the endpoint of a slice, or participate in some sort of
>>> >>> integer arithmetic calculation, etc. I assume you also share this
>>> >>> worry to some extent or you wouldn't have put in the warning ;-).
>>> >>>
>>> >>> I guess the bigger question is, why would we *not* use the standard
>>> >>> method for signaling an exceptional condition here, i.e., exceptions?
>>> >>> That way we're 100% guaranteed that if people aren't prepared to
>>> >>> handle it then they'll at least know something has gone wrong, and if
>>> >>> they are prepared to handle it then it's very easy and standard, just
>>> >>> use try/except. Right now I guess you have to check for the special
>>> >>> value, but also do something to silence warnings, but just for that
>>> >>> one line? Sounds kind of complicated...
>>> >>
>>> >>
>>> >> The main reason was for the case of multiple axis, where some of the
>>> >> results
>>> >> would be valid and others not. The simple thing might be to raise an
>>> >> exception but keep the current return values so that users could
>>> >> determine
>>> >> where the problem occurred.
>>> >
>>> > Oh, duh, yes, right, now I remember this discussion. Sorry for being
>>> > slow.
>>> >
>>> > In the past we've *always* raised in error in the multiple axis case,
>>> > right? Has anyone ever complained? Wanting to get all
>>> > nanargmax/nanargmin results, of which some might be errors, without
>>> > just writing a loop, seems like a pretty exotic case to me, so I'm not
>>> > sure we should optimize for it at the expense of returning
>>> > possibly-misleading results in the scalar case.
>>> >
>>> > Like (I think) you say, we could get the best of both worlds by
>>> > encoding the results in the same way we do right now, but then raise
>>> > an exception and attach the results to the exception so they can be
>>> > retrieved if wanted. Kind of cumbersome, but maybe good?
>>> >
>>> > This is a more general problem though of course -- we've run into it
>>> > in the gufunc linalg code too, where there's some question about you
>>> > do in e.g. chol() if some sub-matrices are positive-definite and some
>>> > are not.
>>> >
>>> > Off the top of my head the general solution might be to define a
>>> > MultiError exception type that has a standard generic format for
>>> > describing such things. It'd need a mask saying which values were
>>> > valid, rather than encoding them directly into the return values --
>>> > otherwise we have the problem where nanargmax wants to use INT_MIN,
>>> > chol wants to use NaN, and maybe the next function along doesn't have
>>> > any usable flag value available at all. So probably more thought is
>>> > needed before nailing down exactly how we handle such "partial" errors
>>> > for vectorized functions.
>>> >
>>> > In the short term (i.e., 1.8.0), maybe we should defer this discussion
>>> > by simply raising a regular ValueError for nanarg functions on all
>>> > errors? That's not a regression from 1.7, since 1.7 also didn't
>>> > provide any way to get at partial results in the event of an error,
>>> > and it leaves us in a good position to solve the more general problem
>>> > later.
>>>
>>> Can we make the error optional in these cases?
>>>
>>> like np.seterr for zerodivision, invalid, or floating point errors
>>> that allows ignore and raise
>>> np.seterr(linalg='ignore')
>>>
>>> I don't know about nanarg, but thinking about some applications for
>>> gufunc linalg code.
>>>
>>> In some cases I might require for example invertibility of all
>>> matrices and raise if one fails,
>>> in other case I would be happy with nans, and just sum the results
>>> with nansum for example or replace them by some fill value.
>>>
>> I'm thinking warnings might be more flexible than exceptions:
>>
>> with warnings.catch_warnings():
>>     warnings.simplefilter('error')
>>     ...
>
> Sure. Passing in a callback or just leaving the function out and telling
> people to implement it themselves would be even more flexible :-). But we
> have to trade off complexity of usage, complexity of teaching people how to
> do stuff (nobody knows how to use catch_warnings, we only know because we
> started writing warning tests just in the last year or so), usefulness in
> common situations, etc. The warnings api doesn't give you any way to pass
> results out, you still need a separate channel to say what failed and what
> succeeded (and maybe for the failures to say what the different failures
> are).

Since numpy and scipy just moved to python 2.6, it's time to advertise
and support
warnings.catch_warnings().

If you want to wait for a "missing value support" in numpy to support
this, then this postpones this to .... (numpy 3.0?)
while gufuncs seem to be happening now.

Josef
"from the balcony"
3-dimensional panel data linear algebra without vec and kron ?

>
> Anyway this back and forth still supprts my main suggestion for *right* now,
> which is that this is sufficiently nonobvious that with 1.8 breathing down
> our necks we should start with the safe behaviour and then work up from
> there.
>
> -n
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>