Treatment of NANs in the statistics module

Sat Mar 17 13:04:31 EDT 2018

On 16/03/18 23:16, Steven D'Aprano wrote:
> The bug tracker currently has a discussion of a bug in the median(), 
> median_low() and median_high() functions that they wrongly compute the 
> medians in the face of NANs in the data:
> 
> https://bugs.python.org/issue33084
> 
> I would like to ask people how they would prefer to handle this issue:
> 
> (1) Put the responsibility on the caller to strip NANs from their data. 
> If there is a NAN in your data, the result of calling median() is 
> implementation-defined. This is the current behaviour, and is likely to 
> be the fastest.
> 
> (2) Return a NAN.
> 
> (3) Raise an exception.
> 
> (4) median() should strip out NANs.
> 
> (5) All of the above, selected by the caller. (In which case, which would 
> you prefer as the default?)
> 
> 
> Thank you.
> 
> 
> 
> 

(2). A user can check for a returned NaN if necessary (so no real need
for (3)). Silently stripping out NaNs strikes me as a terrible idea. The
user should decide how NaNs should be dealt with. Optional arguments to
govern the handling of NaNs - OK as long as the default behaviour is to
return a NaN. There is no sensible default for handling NaNs (or missing
values). Cheers.

Duncan