Treatment of NANs in the statistics module
duncan smith
duncan at invalid.invalid
Sat Mar 17 13:04:31 EDT 2018
On 16/03/18 23:16, Steven D'Aprano wrote:
> The bug tracker currently has a discussion of a bug in the median(),
> median_low() and median_high() functions that they wrongly compute the
> medians in the face of NANs in the data:
>
> https://bugs.python.org/issue33084
>
> I would like to ask people how they would prefer to handle this issue:
>
> (1) Put the responsibility on the caller to strip NANs from their data.
> If there is a NAN in your data, the result of calling median() is
> implementation-defined. This is the current behaviour, and is likely to
> be the fastest.
>
> (2) Return a NAN.
>
> (3) Raise an exception.
>
> (4) median() should strip out NANs.
>
> (5) All of the above, selected by the caller. (In which case, which would
> you prefer as the default?)
>
>
> Thank you.
>
>
>
>
(2). A user can check for a returned NaN if necessary (so no real need
for (3)). Silently stripping out NaNs strikes me as a terrible idea. The
user should decide how NaNs should be dealt with. Optional arguments to
govern the handling of NaNs - OK as long as the default behaviour is to
return a NaN. There is no sensible default for handling NaNs (or missing
values). Cheers.
Duncan
More information about the Python-list
mailing list