[Python-ideas] Re: NAN handling in statistics functions

Aug. 27, 2021

      Perhaps a math.hasnan() function for collections could be implemented with
binary search?

math.hasnan(seq)

Though it is true that if you're using datasets large enough to care about
speed, you should probably be using the SciPy stack instead of statistics
in the first place.

On Fri, Aug 27, 2021, 11:25 AM Christopher Barker <pythonchb@gmail.com>
wrote:
...
If folks want faster processing (checking for, replacing) of NaNs in
sequences, a function written in C could be added to the math module. Or
the statistics module)
Now that I said that, it might make sense to put such a function in the
statistics package, for use their anyway.
Personally, I think if you are working with large enough datasets to care,
you probably should use numpy anyway.
-CHB
On Fri, Aug 27, 2021 at 3:39 AM Jeff Allen <ja.py@farowl.co.uk> wrote:
...
On 26/08/2021 19:41, Brendan Barnwell wrote:
On 2021-08-23 20:53, Steven D'Aprano wrote:
So I propose that statistics functions gain a keyword only parameter to
specify the desired behaviour when a NAN is found:
- raise an exception
- return NAN
- ignore it (filter out NANs)
which seem to be the three most common preference. (It seems to be
split roughly equally between the three.)
Thoughts? Objections?
I'd like to suggest that there isn't a single answer that is most natural
for all functions. There may be as few as two.
Guido's proposal was that mean return nan because the naive arithmetic
formula would return nan. The awkward first example was median(), which is
based on order (comparison). Now Brendan has pointed out:
One important thing we should think about is whether to add similar
handling to `max` and `min`.  These are builtin functions, not in the
statistics module, but they have similarly confusing behavior with NAN:
compare `max(1, 2, float('nan'))` with `max(float('nan'), 1, 2)`.
The real behaviour of max() is to return the first argument that is not
exceeded by any that follow, so:
...
...
...
max(nan, nan2, 1, 2) is nan
True
max(nan2, nan, 1, 2) is nan2
True
As a definition, that is not as easy to understand as "return the largest
argument". The behaviour is because in Python, x>nan is False. This choice,
which is often sensible, makes the set of float values less than totally
ordered. It seems to me to be an error in principle to apply a function
whose simple definition assumes a total ordering, to a set that cannot be
ordered. So most natural to me would be to raise an error for this class of
function.
Meanwhile, functions that have a purely arithmetic definition most
naturally return nan. Are there any other classes of function than
comparison or arithmetic? Counting, perhaps or is that comparison again?
Proposals for a general solution, especially if based on a replacement
value, are more a question of how you would like to pre-filter your set. An
API could offer some filters, or it may be clearer left to the caller. It
is no doubt too late to alter the default behaviour of familiar functions,
but there could be a "strict" mode.
--
Jeff Allen
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/FQNZLN...
Code of Conduct: http://python.org/psf/codeofconduct/
--
Christopher Barker, PhD (Chris)
Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/7CQK5A...
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: NAN handling in statistics functions

Finn Mason