
Perhaps a math.hasnan() function for collections could be implemented with binary search? math.hasnan(seq) Though it is true that if you're using datasets large enough to care about speed, you should probably be using the SciPy stack instead of statistics in the first place. On Fri, Aug 27, 2021, 11:25 AM Christopher Barker <pythonchb@gmail.com> wrote:
If folks want faster processing (checking for, replacing) of NaNs in sequences, a function written in C could be added to the math module. Or the statistics module)
Now that I said that, it might make sense to put such a function in the statistics package, for use their anyway.
Personally, I think if you are working with large enough datasets to care, you probably should use numpy anyway.
-CHB
On Fri, Aug 27, 2021 at 3:39 AM Jeff Allen <ja.py@farowl.co.uk> wrote:
On 26/08/2021 19:41, Brendan Barnwell wrote:
On 2021-08-23 20:53, Steven D'Aprano wrote:
So I propose that statistics functions gain a keyword only parameter to specify the desired behaviour when a NAN is found:
- raise an exception - return NAN - ignore it (filter out NANs)
which seem to be the three most common preference. (It seems to be split roughly equally between the three.)
Thoughts? Objections?
I'd like to suggest that there isn't a single answer that is most natural for all functions. There may be as few as two.
Guido's proposal was that mean return nan because the naive arithmetic formula would return nan. The awkward first example was median(), which is based on order (comparison). Now Brendan has pointed out:
One important thing we should think about is whether to add similar handling to `max` and `min`. These are builtin functions, not in the statistics module, but they have similarly confusing behavior with NAN: compare `max(1, 2, float('nan'))` with `max(float('nan'), 1, 2)`.
The real behaviour of max() is to return the first argument that is not exceeded by any that follow, so:
max(nan, nan2, 1, 2) is nan True max(nan2, nan, 1, 2) is nan2 True
As a definition, that is not as easy to understand as "return the largest argument". The behaviour is because in Python, x>nan is False. This choice, which is often sensible, makes the set of float values less than totally ordered. It seems to me to be an error in principle to apply a function whose simple definition assumes a total ordering, to a set that cannot be ordered. So most natural to me would be to raise an error for this class of function.
Meanwhile, functions that have a purely arithmetic definition most naturally return nan. Are there any other classes of function than comparison or arithmetic? Counting, perhaps or is that comparison again?
Proposals for a general solution, especially if based on a replacement value, are more a question of how you would like to pre-filter your set. An API could offer some filters, or it may be clearer left to the caller. It is no doubt too late to alter the default behaviour of familiar functions, but there could be a "strict" mode.
--
Jeff Allen
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/FQNZLN... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris)
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7CQK5A... Code of Conduct: http://python.org/psf/codeofconduct/