[Python-ideas] NAN handling in the statistics module
Antoine Pitrou
solipsis at pitrou.net
Mon Jan 7 03:24:08 EST 2019
On Sun, 6 Jan 2019 19:40:32 -0800
Stephan Hoyer <shoyer at gmail.com> wrote:
> On Sun, Jan 6, 2019 at 4:27 PM Steven D'Aprano <steve at pearwood.info> wrote:
>
> > I propose adding a "nan_policy" keyword-only parameter to the relevant
> > statistics functions (mean, median, variance etc), and defining the
> > following policies:
> >
> > IGNORE: quietly ignore all NANs
> > FAIL: raise an exception if any NAN is seen in the data
> > PASS: pass NANs through unchanged (the default)
> > RETURN: return a NAN if any NAN is seen in the data
> > WARN: ignore all NANs but raise a warning if one is seen
> >
>
> I don't think PASS should be the default behavior, and I'm not sure it
> would be productive to actually implement all of these options.
>
> For reference, NumPy and pandas (the two most popular packages for data
> analytics in Python) support two of these modes:
> - RETURN (numpy.mean() and skipna=False for pandas)
> - IGNORE (numpy.nanmean() and skipna=True for pandas)
>
> RETURN is the default behavior for NumPy; IGNORE is the default for pandas.
I agree with Stephan that RETURN and IGNORE are the only useful modes
of operation here.
Regards
Antoine.
More information about the Python-ideas
mailing list