[Python-Dev] PEP 450 adding statistics module

Oscar Benjamin oscar.j.benjamin at gmail.com
Fri Aug 16 09:47:55 CEST 2013


On 15 August 2013 14:08, Steven D'Aprano <steve at pearwood.info> wrote:
>
> - The API doesn't really feel very Pythonic to me. For example, we write:
>
> mystring.rjust(width)
> dict.items()
>
> rather than mystring.justify(width, "right") or dict.iterate("items"). So I
> think individual methods is a better API, and one which is more familiar to
> most Python users. The only innovation (if that's what it is) is to have
> median a callable object.

Although you're talking about median() above I think that this same
reasoning applies to the mode() signature. In the reference
implementation it has the signature:

def mode(data, max_modes=1):
    ...

The behaviour is that with the default max_modes=1 it will return the
unique mode or raise an error if there isn't a unique mode:

>>> mode([1, 2, 3, 3])
3
>>> mode([])
StatisticsError: no mode
>>> mode([1, 1, 2, 3, 3])
AssertionError

You can use the max_modes parameter to specify that more than one mode
is acceptable and setting max_modes to 0 or None returns all modes no
matter how many. In these cases mode() returns a list:

>>> mode([1, 1, 2, 3, 3], max_modes=2)
[1, 3]
>>> mode([1, 1, 2, 3, 3], max_modes=None)
[1, 3]

I can't think of a situation where 1 or 2 modes are acceptable but 3
is not. The only forms I can imagine using are mode(data) to get the
unique mode if it exists and mode(data, max_modes=None) to get the set
of all modes. But for that usage it would be better to have a boolean
flag and then either way you're at the point where it would normally
become two functions.

Also I dislike changing the return type based on special numeric values:
>>> mode([1, 2, 3, 3], max_modes=0)
[3]
>>> mode([1, 2, 3, 3], max_modes=1)
3
>>> mode([1, 2, 3, 3], max_modes=2)
[3]
>>> mode([1, 2, 3, 3], max_modes=3)
[3]

My preference would be to have two functions, one called e.g. modes()
and one called mode(). modes() always returns a list of the most
frequent values no matter how many. mode() returns a unique mode if
there is one or raises an error. I think that that would be simpler to
document and easier to learn and use. If the user is for whatever
reason happy with 1 or 2 modes but not 3 then they can call modes()
and check for themselves.

Also I think that:
>>> modes([])
[]
but I expect others to disagree.


Oscar


More information about the Python-Dev mailing list