[issue23522] Misleading note in Statistics module documentation
New submission from Jake: In the statistics module documentation, there is a note that states that "The mean is strongly affected by outliers and is not a robust estimator for central location: the mean is not necessarily a typical example of the data points. For more robust, although less efficient, measures of central location, see median() and mode()" https://docs.python.org/3/library/statistics.html While I appreciate the intention, this is quite misleading. The implication is that the mean, median and mode are different ways to estimate one "central location", however, in reality they are very different things (albeit which refer to a similar notion). The sample mean is an unbiased estimator of the true mean but it need not be unbiased as an estimator of the true median or modes and vice versa for the median and mode. To make this clearer I would rephrase to "The mean is strongly affected by outliers and is not necessarily representative of the central tendency of the data. For cases with large outliers or very low sample size, see median() and mode()" Apologies if this is seen as frivolous, but statistics can be hard enough to remain very clear about even when the words are used precisely. ---------- assignee: docs@python components: Documentation messages: 236612 nosy: Journeyman08, docs@python priority: normal severity: normal status: open title: Misleading note in Statistics module documentation type: enhancement versions: Python 3.4 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23522> _______________________________________
Changes by SilentGhost <ghost.adh@gmail.com>: ---------- nosy: +steven.daprano _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue23522> _______________________________________
Irit Katriel <iritkatriel@yahoo.com> added the comment: I agree with Jake's comment, but I think the solution is to remove that Note altogether. This document is a software manual, not a statistics textbook, and as such it should just state clearly what the statistics module does. If someone doesn't know whether they need the mean or the median, they really need to read a more fundamental text before writing their code. ---------- nosy: +iritkatriel _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Change by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- keywords: +patch nosy: +rhettinger nosy_count: 4.0 -> 5.0 pull_requests: +22703 stage: -> patch review pull_request: https://github.com/python/cpython/pull/23842 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Steven D'Aprano <steve+python@pearwood.info> added the comment: I strongly oppose this change, and I dispute the characterisation of this as a misleading note. It is not misleading, and I argue that every word of it is factually correct. Jake, if you disagree, then please provide some citations. Irit: it is ridiculous to describe a two paragraph (nine line) note as "a statistics textbook". That is an exaggerated position that doesn't help the discussion. It's not a textbook, it is a short note that helps users whose knowledge of statistics is naive to understand which statistic is better for them. "If someone doesn't know whether they need the mean or the median, they really need to read a more fundamental text before writing their code." I totally disagree. This module is not intended only for statisticians and experts, and the user who isn't sure which average to use shouldn't have to read a textbook on the fundamentals of statistics. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: FWIW, Allen Downey also had concerns about this wording. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Steven D'Aprano <steve+python@pearwood.info> added the comment: Sorry Raymond, I missed this before closing the task.
FWIW, Allen Downey also had concerns about this wording.
I don't recognise the name, who is Allen Downey and what concerns does he have? ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Change by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- assignee: docs@python -> steven.daprano _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Change by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- Removed message: https://bugs.python.org/msg383343 _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Raymond Hettinger <raymond.hettinger@gmail.com> added the comment: Steven, you are the module maintainer. So if you're sure about the current wording, go ahead and close this. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Steven D'Aprano <steve+python@pearwood.info> added the comment: I'm willing to give Irit and Jake opportunity to make their case. Particularly if they can demonstrate that I got my facts wrong. I'm going to close the ticket, but if anyone feels strongly enough to respond with a good argument, or better still citations demonstrating that the comment is factually wrong, I am open to revising or removing the wording. ---------- _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
Change by Steven D'Aprano <steve+python@pearwood.info>: ---------- resolution: -> rejected stage: patch review -> resolved status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <https://bugs.python.org/issue23522> _______________________________________
participants (5)
-
Irit Katriel
-
Jake
-
Raymond Hettinger
-
SilentGhost
-
Steven D'Aprano