[Python-ideas] NAN handling in the statistics module

Mon Jan 7 03:24:08 EST 2019

On Sun, 6 Jan 2019 19:40:32 -0800
Stephan Hoyer <shoyer at gmail.com> wrote:
> On Sun, Jan 6, 2019 at 4:27 PM Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > I propose adding a "nan_policy" keyword-only parameter to the relevant
> > statistics functions (mean, median, variance etc), and defining the
> > following policies:
> >
> >     IGNORE:  quietly ignore all NANs
> >     FAIL:  raise an exception if any NAN is seen in the data
> >     PASS:  pass NANs through unchanged (the default)
> >     RETURN:  return a NAN if any NAN is seen in the data
> >     WARN:  ignore all NANs but raise a warning if one is seen
> >  
> 
> I don't think PASS should be the default behavior, and I'm not sure it
> would be productive to actually implement all of these options.
> 
> For reference, NumPy and pandas (the two most popular packages for data
> analytics in Python) support two of these modes:
> - RETURN (numpy.mean() and skipna=False for pandas)
> - IGNORE (numpy.nanmean() and skipna=True for pandas)
> 
> RETURN is the default behavior for NumPy; IGNORE is the default for pandas.

I agree with Stephan that RETURN and IGNORE are the only useful modes
of operation here.

Regards

Antoine.