[issue35892] Fix awkwardness of statistics.mode() for multimodal datasets

Raymond Hettinger report at bugs.python.org
Sun Feb 3 13:51:27 EST 2019


New submission from Raymond Hettinger <raymond.hettinger at gmail.com>:

The current code for mode() does a good deal of extra work to support its two error outcomes (empty input and multimodal input).  That latter case is informative but doesn't provide any reasonable way to find just one of those modes, where any of the most popular would suffice.  This arises in nearest neighbor algorithms for example. I suggest adding an option to the API:

   def mode(seq, *, first_tie=False):       
       if tie_goes_to_first:
           # CHOOSE FIRST x ∈ S | ∄ y ∈ S : x ≠ y ∧ count(y) > count(x)
           return return Counter(seq).most_common(1)[0][0]
       ...

Use it like this:

    >>> data = 'ABBAC'
    >>> assert mode(data, first_tie=True) == 'A'

With the current API, there is no reasonable way to get to 'A' from 'ABBAC'.

Also, the new code path is much faster than the existing code path because it extracts only the 1 most common using min() rather than the n most common which has to sort the whole items() list.  New path: O(n).  Existing path: O(n log n).

Note, the current API is somewhat awkward to use.  In general, a user can't know in advance that the data only contains a single mode.  Accordingly, every call to mode() has to be wrapped in a try-except.  And if the user just wants one of those modal values, there is no way to get to it.  See https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mode.html for comparison.

There may be better names for the flag.  "tie_goes_to_first_encountered" seemed a bit long though ;-)

----------
assignee: steven.daprano
components: Library (Lib)
messages: 334796
nosy: rhettinger, steven.daprano
priority: normal
severity: normal
status: open
title: Fix awkwardness of statistics.mode() for multimodal datasets
type: behavior
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue35892>
_______________________________________


More information about the Python-bugs-list mailing list