On Mon, Jan 7, 2019 at 8:39 AM Steven D'Aprano <steve@pearwood.info> wrote:
Its not a bug in median(), because median requires the data implement a total order. Although that isn't explicitly documented, it is common sense: if the data cannot be sorted into smallest-to-largest order, how can you decide which value is in the middle?
What is explicitly documented is that median requires numeric data, and NANs aren't numbers. So the only bug here is the caller's failure to filter out NANs. If you pass it garbage data, you get garbage results.
Nevertheless, it is a perfectly reasonable thing to want to use data which may or may not contain NANs, and I want to enhance the statistics module to make it easier for the caller to handle NANs in whichever way they see fit. This is a new feature, not a bug fix.
So then you are arguing that making reasonable treatment of NANs the default is not breaking backwards compatibility (because previously the data was considered wrong). This sounds like a good idea to me. Presumably the NANs are inserted into the data explicitly in order to signal missing data -- this seems more plausible to me (given the typical use case for the statistics module) than that they would be the result of a computation like Inf/Inf. (While propagating NANs makes sense for the fundamental arithmetical and mathematical functions, given that we have chosen not to raise an error when encountering them, I think other stdlib libraries are not beholden to that behavior.) -- --Guido van Rossum (python.org/~guido)