
On Sat, 2021-08-28 at 11:49 +1000, Steven D'Aprano wrote:
On Tue, Aug 24, 2021 at 01:53:51PM +1000, Steven D'Aprano wrote:
I've spoken to users of other statistics packages and languages, such as R, and I cannot find any consensus on what the "right" behaviour should be for NANs except "not that!".
So I propose that statistics functions gain a keyword only parameter to specify the desired behaviour when a NAN is found:
Thanks everyone for the feedback, does anyone have a strong opinion on what to name this parameter?
In R, the usual parameter name is typically "na.rm" to remove them:
https://stat.ethz.ch/R-manual/R-patched/library/base/html/mean.html
https://stat.ethz.ch/R-manual/R-patched/library/stats/html/sd.html
Matlab optionally takes one of two strings:
https://au.mathworks.com/help/matlab/ref/mean.html?#d123e832786
It doesn't seem to have named parameters.
I'm leaning towards "nans=..." with an enum.
SciPy should probably also be a data-point, it uses: nan_policy : {'propagate', 'raise', 'omit'}, optional statsmodels seems to use: missing : str Available options are ‘none’, ‘drop’, and ‘raise’ pandas has skipna=bool. Since pandas and statsmodels hint to "missing values", there is likely a good reason to not worry about them. I guess it was already noted that both statsmodels and SciPy default to propagating. [1] Cheers, Sebastian [1] In general Python is more careful since it raises errors sometimes. But this is almost only(?) when creating a non-finite value from finite values. Not when propagating non-finite values (which are not normally IEEE warnings, although creating NaN from inf with `inf - inf` is). In that sense it is different, but probably not much.