[Python-ideas] NAN handling in the statistics module

Neil Girdhar mistersheik at gmail.com
Thu Jan 10 11:42:11 EST 2019

On Monday, January 7, 2019 at 3:16:07 AM UTC-5, Steven D'Aprano wrote:
> (By the way, I'm not outright disagreeing with you, I'm trying to weigh 
> up the pros and cons of your position. You've given me a lot to think 
> about. More below.) 
> On Sun, Jan 06, 2019 at 11:31:30PM -0800, Nathaniel Smith wrote: 
> > On Sun, Jan 6, 2019 at 11:06 PM Steven D'Aprano <st... at pearwood.info 
> <javascript:>> wrote: 
> > > I'm not wedded to the idea that the default ought to be the current 
> > > behaviour. If there is a strong argument for one of the others, I'm 
> > > listening. 
> > 
> > "Errors should never pass silently"? Silently returning nonsensical 
> > results is hard to defend as a default behavior IMO :-) 
> If you violate the assumptions of the function, just about everything 
> can in principle return nonsensical results. True, most of the time you 
> have to work hard at it: 
> class MyList(list): 
>     def __len__(self): 
>         return random.randint(0, sys.maxint) 
> but it isn't unreasonable to document the assumptions of a function, and 
> if the caller violates those assumptions, Garbage In Garbage Out 
> applies. 

I'm with Antoine, Nathaniel, David, and Chris: it is unreasonable to 
silently return nonsensical results even if you've documented it.  
Documenting it only makes it worse because it's like an "I told you so" 
when people finally figure out what's wrong and go to file the bug.

> E.g. bisect requires that your list is sorted in ascending order. If it 
> isn't, the results you get are nonsensical. 
> py> data = [8, 6, 4, 2, 0] 
> py> bisect.bisect(data, 1) 
> 0 
> That's not a bug in bisect, that's a bug in the caller's code, and it 
> isn't bisect's responsibility to fix it. 
> Although it could be documented better, that's the current situation 
> with NANs and median(). Data with NANs don't have a total ordering, and 
> total ordering is the unstated assumption behind the idea of a median or 
> middle value. So all bets are off. 
> > > How would you answer those who say that the right behaviour is not to 
> > > propogate unwanted NANs, but to fail fast and raise an exception? 
> > 
> > Both seem defensible a priori, but every other mathematical operation 
> > in Python propagates NaNs instead of raising an exception. Is there 
> > something unusual about median that would justify giving it unusual 
> > behavior? 
> Well, not everything... 
> py> NAN/0 
> Traceback (most recent call last): 
>   File "<stdin>", line 1, in <module> 
> ZeroDivisionError: float division by zero 
> There may be others. But I'm not sure that "everything else does it" is 
> a strong justification. It is *a* justification, since consistency is 
> good, but consistency does not necessarily outweigh other concerns. 
> One possible argument for making PASS the default, even if that means 
> implementation-dependent behaviour with NANs, is that in the absense of 
> a clear preference for FAIL or RETURN, at least PASS is backwards 
> compatible. 
> You might shoot yourself in the foot, but at least you know its the same 
> foot you shot yourself in using the previous version *wink* 
> -- 
> Steve 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> https://mail.python.org/mailman/listinfo/python-ideas 
> Code of Conduct: http://python.org/psf/codeofconduct/ 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190110/e306ae25/attachment.html>

More information about the Python-ideas mailing list