[Python-ideas] NAN handling in the statistics module
steve at pearwood.info
Mon Jan 7 03:09:54 EST 2019
(By the way, I'm not outright disagreeing with you, I'm trying to weigh
up the pros and cons of your position. You've given me a lot to think
about. More below.)
On Sun, Jan 06, 2019 at 11:31:30PM -0800, Nathaniel Smith wrote:
> On Sun, Jan 6, 2019 at 11:06 PM Steven D'Aprano <steve at pearwood.info> wrote:
> > I'm not wedded to the idea that the default ought to be the current
> > behaviour. If there is a strong argument for one of the others, I'm
> > listening.
> "Errors should never pass silently"? Silently returning nonsensical
> results is hard to defend as a default behavior IMO :-)
If you violate the assumptions of the function, just about everything
can in principle return nonsensical results. True, most of the time you
have to work hard at it:
return random.randint(0, sys.maxint)
but it isn't unreasonable to document the assumptions of a function, and
if the caller violates those assumptions, Garbage In Garbage Out
E.g. bisect requires that your list is sorted in ascending order. If it
isn't, the results you get are nonsensical.
py> data = [8, 6, 4, 2, 0]
py> bisect.bisect(data, 1)
That's not a bug in bisect, that's a bug in the caller's code, and it
isn't bisect's responsibility to fix it.
Although it could be documented better, that's the current situation
with NANs and median(). Data with NANs don't have a total ordering, and
total ordering is the unstated assumption behind the idea of a median or
middle value. So all bets are off.
> > How would you answer those who say that the right behaviour is not to
> > propogate unwanted NANs, but to fail fast and raise an exception?
> Both seem defensible a priori, but every other mathematical operation
> in Python propagates NaNs instead of raising an exception. Is there
> something unusual about median that would justify giving it unusual
Well, not everything...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ZeroDivisionError: float division by zero
There may be others. But I'm not sure that "everything else does it" is
a strong justification. It is *a* justification, since consistency is
good, but consistency does not necessarily outweigh other concerns.
One possible argument for making PASS the default, even if that means
implementation-dependent behaviour with NANs, is that in the absense of
a clear preference for FAIL or RETURN, at least PASS is backwards
You might shoot yourself in the foot, but at least you know its the same
foot you shot yourself in using the previous version *wink*
More information about the Python-ideas