[Python-ideas] NAN handling in the statistics module
mertz at gnosis.cx
Sun Jan 6 23:05:39 EST 2019
[... apologies if this is dup, got a bounce ...]
> [David Mertz <mertz at gnosis.cx>]
>> I have to say though that the existing behavior of
>> is SURPRISING if not outright wrong. It is the behavior in existing
>> but it is very strange.
>> The implementation simply does whatever `sorted()` does, which is an
>> implementation detail. In particular, NaN's being neither less than nor
>> greater than any floating point number, just stay where they are during
> I expect you inferred that from staring at a handful of examples, but
> it's illusion. Python's sort uses only __lt__ comparisons, and if
> those don't implement a total ordering then _nothing_ is defined about
> sort's result (beyond that it's some permutation of the original
Thanks Tim for clarifying. Is it even the case that sorts are STABLE in
the face of non-total orderings under __lt__? A couple quick examples
don't refute that, but what I tried was not very thorough, nor did I
think much about TimSort itself.
> So, certainly, if you want median to be predictable in the presence of
> NaNs, sort's behavior in the presence of NaNs can't be relied on in
> any respect.
Playing with Tim's examples, this suggests that statistics.median() is
simply outright WRONG. I can think of absolutely no way to characterize
these as reasonable results:
Python 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 09:50:42)
In : statistics.median([9, 9, 9, nan, 1, 2, 3, 4, 5])
In : statistics.median([9, 9, 9, nan, 1, 2, 3, 4])
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Python-ideas