[Python-ideas] NAN handling in the statistics module

Mon Jan 7 11:49:33 EST 2019

On Mon, Jan 7, 2019 at 8:39 AM Steven D'Aprano <steve at pearwood.info> wrote:

> Its not a bug in median(), because median requires the data implement a
> total order. Although that isn't explicitly documented, it is common
> sense: if the data cannot be sorted into smallest-to-largest order, how
> can you decide which value is in the middle?
>
> What is explicitly documented is that median requires numeric data, and
> NANs aren't numbers. So the only bug here is the caller's failure to
> filter out NANs. If you pass it garbage data, you get garbage results.
>
> Nevertheless, it is a perfectly reasonable thing to want to use data
> which may or may not contain NANs, and I want to enhance the statistics
> module to make it easier for the caller to handle NANs in whichever way
> they see fit. This is a new feature, not a bug fix.
>

So then you are arguing that making reasonable treatment of NANs the
default is not breaking backwards compatibility (because previously the
data was considered wrong). This sounds like a good idea to me. Presumably
the NANs are inserted into the data explicitly in order to signal missing
data -- this seems more plausible to me (given the typical use case for the
statistics module) than that they would be the result of a computation like
Inf/Inf. (While propagating NANs makes sense for the fundamental
arithmetical and mathematical functions, given that we have chosen not to
raise an error when encountering them, I think other stdlib libraries are
not beholden to that behavior.)

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190107/7e3e68a9/attachment.html>