[Python-ideas] Re: NAN handling in statistics functions

Aug. 24, 2021


      Urgh. That's a nasty dilemma. I propose that the default should be return
NAN, since that's what you'd expect if you did the super-naive arithmetic
version (e.g. mean(x, y, z) = (x+y+z)/3).

On Mon, Aug 23, 2021 at 8:55 PM Steven D'Aprano <steve@pearwood.info> wrote:
...
At the moment, the handling of NANs in the statistics module is
implementation dependent. In practice, that *usually* means that if your
data has a NAN in it, the result you get will probably be a NAN.
>>> statistics.mean([1, 2, float('nan'), 4])
    nan
But there are unfortunate exceptions to this:
>>> statistics.median([1, 2, float('nan'), 4])
    nan
    >>> statistics.median([float('nan'), 1, 2, 4])
    1.5
I've spoken to users of other statistics packages and languages, such as
R, and I cannot find any consensus on what the "right" behaviour should
be for NANs except "not that!".
So I propose that statistics functions gain a keyword only parameter to
specify the desired behaviour when a NAN is found:
- raise an exception
- return NAN
- ignore it (filter out NANs)
which seem to be the three most common preference. (It seems to be
split roughly equally between the three.)
Thoughts? Objections?
Does anyone have any strong feelings about what should be the default?
--
Steve
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/EDRF2N...
Code of Conduct: http://python.org/psf/codeofconduct/
-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*
<http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>