
This came up in discussion here before, maybe a year ago, I think. There was a decision not to change the implementation, but that seemed like a mistake (and the discussion was about broader things).
Anyway, I propose that the obviously broken version of `statistics.median()` be replaced with a better implementation.
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
import numpy as np import pandas as pd import statistics nan = float('nan') items1 = [nan, 1, 2, 3, 4] items2 = [1, 2, 3, 4, nan] statistics.median(items1)
2
statistics.median(items2)
3
np.median(items1)
nan
np.median(items2)
nan
pd.Series(items1).median()
2.5
pd.Series(items2).median()
2.5
The NumPy and Pandas answers are both "reasonable" under slightly different philosophies of how to handle bad values. I think raising an exception for NaNs would also be reasonable enough.
The one thing that is NOT reasonable is returning different answers for median depending on the order of the elements.
On Thu, Dec 26, 2019 at 10:10 AM Marco Sulla via Python-ideas < python-ideas@python.org> wrote:
So why only mean and not median, that's better for statistics? :D
Seriously, if you want it builtin, add it to PYTHONSTARTUP: https://docs.python.org/3/using/cmdline.html#envvar-PYTHONSTARTUP
from statistics import mean
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/JNMWEY... Code of Conduct: http://python.org/psf/codeofconduct/