On Mon, Jan 7, 2019 at 8:39 AM Steven D'Aprano <steve@pearwood.info> wrote:

Its not a bug in median(), because median requires the data implement a
total order. Although that isn't explicitly documented, it is common
sense: if the data cannot be sorted into smallest-to-largest order, how
can you decide which value is in the middle?

What is explicitly documented is that median requires numeric data, and
NANs aren't numbers. So the only bug here is the caller's failure to
filter out NANs. If you pass it garbage data, you get garbage results.

Nevertheless, it is a perfectly reasonable thing to want to use data
which may or may not contain NANs, and I want to enhance the statistics
module to make it easier for the caller to handle NANs in whichever way
they see fit. This is a new feature, not a bug fix.

So then you are arguing that making reasonable treatment of NANs the default is not breaking backwards compatibility (because previously the data was considered wrong). This sounds like a good idea to me. Presumably the NANs are inserted into the data explicitly in order to signal missing data -- this seems more plausible to me (given the typical use case for the statistics module) than that they would be the result of a computation like Inf/Inf. (While propagating NANs makes sense for the fundamental arithmetical and mathematical functions, given that we have chosen not to raise an error when encountering them, I think other stdlib libraries are not beholden to that behavior.)

--Guido van Rossum (python.org/~guido)