[Python-ideas] NAN handling in the statistics module

Steven D'Aprano steve at pearwood.info
Mon Jan 7 03:09:54 EST 2019


(By the way, I'm not outright disagreeing with you, I'm trying to weigh 
up the pros and cons of your position. You've given me a lot to think 
about. More below.)

On Sun, Jan 06, 2019 at 11:31:30PM -0800, Nathaniel Smith wrote:
> On Sun, Jan 6, 2019 at 11:06 PM Steven D'Aprano <steve at pearwood.info> wrote:
> > I'm not wedded to the idea that the default ought to be the current
> > behaviour. If there is a strong argument for one of the others, I'm
> > listening.
> 
> "Errors should never pass silently"? Silently returning nonsensical
> results is hard to defend as a default behavior IMO :-)

If you violate the assumptions of the function, just about everything 
can in principle return nonsensical results. True, most of the time you 
have to work hard at it:

class MyList(list):
    def __len__(self):
        return random.randint(0, sys.maxint)

but it isn't unreasonable to document the assumptions of a function, and 
if the caller violates those assumptions, Garbage In Garbage Out 
applies.

E.g. bisect requires that your list is sorted in ascending order. If it 
isn't, the results you get are nonsensical.

py> data = [8, 6, 4, 2, 0]
py> bisect.bisect(data, 1)
0

That's not a bug in bisect, that's a bug in the caller's code, and it 
isn't bisect's responsibility to fix it.

Although it could be documented better, that's the current situation 
with NANs and median(). Data with NANs don't have a total ordering, and 
total ordering is the unstated assumption behind the idea of a median or 
middle value. So all bets are off.

 
> > How would you answer those who say that the right behaviour is not to
> > propogate unwanted NANs, but to fail fast and raise an exception?
> 
> Both seem defensible a priori, but every other mathematical operation
> in Python propagates NaNs instead of raising an exception. Is there
> something unusual about median that would justify giving it unusual
> behavior?

Well, not everything... 

py> NAN/0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: float division by zero


There may be others. But I'm not sure that "everything else does it" is 
a strong justification. It is *a* justification, since consistency is 
good, but consistency does not necessarily outweigh other concerns.

One possible argument for making PASS the default, even if that means 
implementation-dependent behaviour with NANs, is that in the absense of 
a clear preference for FAIL or RETURN, at least PASS is backwards 
compatible.

You might shoot yourself in the foot, but at least you know its the same 
foot you shot yourself in using the previous version *wink*



-- 
Steve


More information about the Python-ideas mailing list