
On Sun, 6 Jan 2019 19:40:32 -0800 Stephan Hoyer <shoyer@gmail.com> wrote:
On Sun, Jan 6, 2019 at 4:27 PM Steven D'Aprano <steve@pearwood.info> wrote:
I propose adding a "nan_policy" keyword-only parameter to the relevant statistics functions (mean, median, variance etc), and defining the following policies:
IGNORE: quietly ignore all NANs FAIL: raise an exception if any NAN is seen in the data PASS: pass NANs through unchanged (the default) RETURN: return a NAN if any NAN is seen in the data WARN: ignore all NANs but raise a warning if one is seen
I don't think PASS should be the default behavior, and I'm not sure it would be productive to actually implement all of these options.
For reference, NumPy and pandas (the two most popular packages for data analytics in Python) support two of these modes: - RETURN (numpy.mean() and skipna=False for pandas) - IGNORE (numpy.nanmean() and skipna=True for pandas)
RETURN is the default behavior for NumPy; IGNORE is the default for pandas.
I agree with Stephan that RETURN and IGNORE are the only useful modes of operation here. Regards Antoine.