[Python-ideas] Discuss: what should statistics.mode do?
Steve Barnes
gadgetsteve at live.co.uk
Tue Feb 19 00:47:49 EST 2019
-----Original Message-----
From: Python-ideas <python-ideas-bounces+gadgetsteve=live.co.uk at python.org> On Behalf Of Steven D'Aprano
Sent: 19 February 2019 00:16
To: python-ideas at python.org
Subject: [Python-ideas] Discuss: what should statistics.mode do?
See https://bugs.python.org/issue35892
The issue here is that statistics.mode, as initially designed, raises an exception for the case of multiple equal most-frequent data points. This is true to the way mode is taught in schools, but it may not be the most useful behaviour.
Raymond has suggested:
- keep the status quo;
- change mode() to return "the first tie" instead of raising an
exception (with or without a deprecation warning for one release);
- add a flag to specify the behaviour.
I'm especially interested in opinions from those who use the function. What would be useful for you? How do you use it?
interactively or in scripts?
(When I designed this, I mostly imagined that mode() would be used interactively, using the interpreter as a calculator.)
Would changing the behaviour break your code?
Note that this question is seperate from that of whether or not there should be a multimode function.
[Steve Barnes]
Can I suggest that one option other would be to modify the current exception, (e.g. "StatisticsError: no unique mode; found 2 equally common values"), to include the count and the values either as an additional property of the exception, or as a part of the text (probably with some truncation).
Being told that there are n equally common values, either working interactively or in a script, is of limited value but being told what they are would be much more valuable. Currently once I know that there are multiple "mode" values I would have to switch to collections Counter for most_common or some such and recalculate - (keeping in mind that I might be processing millions of values). Since mode only really makes sense for discrete values such as integers it is important information as a) the count might be 1 i.e. all the values are unique, b) if there are 2 or 3 modes and they are adjacent the mid-point can be used as a sensible approximation or c) if there are 2 or more distinct clusters it helps to visualise this without having a plot.
I also think that this would be the change with minimal disruption of the status quo as all existing code will still get the exception but with (optionally) more useful information.
Steve Barnes
More information about the Python-ideas
mailing list