Re: [Python-ideas] NAN handling in the statistics module

7 Jan 2019


      On Mon, Jan 7, 2019 at 8:39 AM Steven D'Aprano <steve@pearwood.info> wrote:
...
Its not a bug in median(), because median requires the data implement a
total order. Although that isn't explicitly documented, it is common
sense: if the data cannot be sorted into smallest-to-largest order, how
can you decide which value is in the middle?
What is explicitly documented is that median requires numeric data, and
NANs aren't numbers. So the only bug here is the caller's failure to
filter out NANs. If you pass it garbage data, you get garbage results.
Nevertheless, it is a perfectly reasonable thing to want to use data
which may or may not contain NANs, and I want to enhance the statistics
module to make it easier for the caller to handle NANs in whichever way
they see fit. This is a new feature, not a bug fix.
So then you are arguing that making reasonable treatment of NANs the
default is not breaking backwards compatibility (because previously the
data was considered wrong). This sounds like a good idea to me. Presumably
the NANs are inserted into the data explicitly in order to signal missing
data -- this seems more plausible to me (given the typical use case for the
statistics module) than that they would be the result of a computation like
Inf/Inf. (While propagating NANs makes sense for the fundamental
arithmetical and mathematical functions, given that we have chosen not to
raise an error when encountering them, I think other stdlib libraries are
not beholden to that behavior.)

-- 
--Guido van Rossum (python.org/~guido)