On Mon, Jan 07, 2019 at 11:27:22AM +1100, Steven D'Aprano wrote:
I propose adding a "nan_policy" keyword-only parameter to the relevant statistics functions (mean, median, variance etc), and defining the following policies:
I asked some heavy users of statistics software (not just Python users) what behaviour they would find useful, and as I feared, I got no conclusive answer. So far, the answers seem to be almost evenly split into four camps:
- don't do anything, it is the caller's responsibility to filter NANs;
- raise an immediate error;
- return a NAN;
- treat them as missing data.
(Currently it is a small sample size, so I don't expect the answers will stay evenly split if more people answer.)
On consideration of all the views expressed, thank you to everyone who commented, I'm now inclined to default to returning a NAN (which happens to be the current behaviour of mean etc, but not median except by accident) even if it impacts performance.