New submission from Jake:
In the statistics module documentation, there is a note that states that
"The mean is strongly affected by outliers and is not a robust estimator for central location: the mean is not necessarily a typical example of the data points. For more robust, although less efficient, measures of central location, see median() and mode()"
https://docs.python.org/3/library/statistics.html
While I appreciate the intention, this is quite misleading. The implication is that the mean, median and mode are different ways to estimate one "central location", however, in reality they are very different things (albeit which refer to a similar notion).
The sample mean is an unbiased estimator of the true mean but it need not be unbiased as an estimator of the true median or modes and vice versa for the median and mode.
To make this clearer I would rephrase to
"The mean is strongly affected by outliers and is not necessarily representative of the central tendency of the data. For cases with large outliers or very low sample size, see median() and mode()"
Apologies if this is seen as frivolous, but statistics can be hard enough to remain very clear about even when the words are used precisely.
----------
assignee: docs@python
components: Documentation
messages: 236612
nosy: Journeyman08, docs@python
priority: normal
severity: normal
status: open
title: Misleading note in Statistics module documentation
type: enhancement
versions: Python 3.4
_______________________________________
Python tracker