On Sat, Dec 28, 2019 at 9:42 PM Brendan Barnwell <brenbarn@brenbarn.net> wrote:

But that is the problem. "The applied mathematics of computing" is
floating point, and in floating point, NaN is a number (despite its
name).

careful here -- that may just re-ignite the argument :-(

computers work with are floats, and NaN is a float, so in any relevant
sense it is <snip> an instance of a numerical type.

Stick with that -- we hopefully can all agree to it.

One of the things the docs specifically say is that the statistics module is designed to work with floats and Decimal, and with either of these can, and will occasionally have a NaN value. And that value(s) maybe have gotten there without the users knowing it, and many users will not understand the implications of them.

In the case of mean(), that's not too bad -- they'll get a NaN as a result, go WTF? and then hopefully do a bit of research to figure out what happened.

But for median (and friends), they will get an arbitrary, incorrect result. That is not good.

I
think the simplest solution is to just put an explicit warning in the
docs that says "Results may be meaningless if your data contain NaN".

Absolutely -- behavior around NaN should be well documented. Ideally we'd add a bit more explanation that that -- but SOMETHING would be good. And that can be done far more easily than actually changing the module (and back-ported to older versions of the docs)

A next step might be to provide a couple nan-handling utility functions: maybe an is_nan() function, or a filter_nan function, or a all_finite() function. These are not as trivial as they seem at first, so good to provide them.

Then we'd ideally update the code itself to better handle NaNs -- which is a bigger lift.

But yes -- AT LEAST UPDATETHE DOCS!

- CHB

Christopher Barker, PhD

Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython