[Python-ideas] Lessons learned from an API design mistake [was New explicit methods to trim strings]

Mon Apr 1 22:03:26 EDT 2019

I think the point Chris made about statistics.mode is important enough 
to start a new subthread about API design, and the lessons learned.

On Mon, Apr 01, 2019 at 02:29:44PM +1100, Chris Angelico wrote:

> We're basically debating collision semantics here. It's on par with
> asking "how should statistics.mode() cope with multiple modes?".
> Should the introduction of statistics.mode() have been delayed pending
> a thorough review of use-cases, or is it okay to make it do what most
> people want, and then be prepared to revisit its edge-case handling?
> 
> (For those who don't know, mode() was changed in 3.8 to return the
> first mode encountered, in contrast to previous behaviour where it
> would raise an exception.)

For those who are unaware, I was responsible for chosing the semantics 
of statistics.mode. My choice was to treat mode() as it is taught in 
secondary schools here in Australia, namely that if there are two or 
more equally common values, there is no mode.

Statistically, there is no one right answer to how to treat multiple 
modes. Sometimes you treat them as true multiple modes, sometimes you 
say there is no mode, and sometimes you treat the fact that there are 
multiple modes as an artifact of the sample and pick one or another as 
the actual mode. There's no particular statistical reason to choose the 
first over the second or the third.

So following the Zen, I refused to guess, and raised an exception. (I 
toyed with returning None instead, but decided against it for reasons 
that don't matter here.)

This seemed like a good decision up front, and I don't remember there 
being any objections to that behaviour when the PEP was discussed both 
here and on Python-Dev.

But once we had a few years of real-world practice, it turns out that:

(1) Raising an exception was an annoying choice that meant that every
use of mode() outside of the interactive interpreter needed to be
wrapped in a try...except block, making it painful to use.

(2) There are at least one good use-case for returning the first mode, 
even though statistically there's no reason to prefer it over any other.

Importantly, that use-case was something that neither I, nor anyone
involved in the original PEP debate for this, had thought of. It took a
few years of actual use in the wild before anyone came up with an
important, definitive use-case -- and it turns out to be completely
unrelated to the statistical use of mode!

Raymond Hettinger persuaded me that this non-statistics use-case was
important enough for mode to pick a behaviour which has no statistical
justification.

(Also, many other statistics packages do the same thing, so even if
we're wrong, we're no worse than everyone else.)

Had I ignored the Zen and, in the face of ambiguity, *guessed* 
which mode to return, I could have guessed wrongly and returned one of 
these:

- the largest mode
- or the smallest
- the last seen mode
- the mode closest to the mean
- or median, or some other measure of central tendency
- or some sort of special "multi-mode" object (perhaps a list).

I would have missed a real use-case that I never imagined existed, as 
well as a good opportunity for optimization. Raymond's new version of 
mode is faster as well as more useful.

Because I *refused to guess* and raised an exception:

(1) mode was harder to use than it should have been;

(2) but we were able to change its behaviour without a lengthy and 
annoying depreciation period, or introducing a "new_mode" function.

Knowing what I know *now*, if I were designing mode() from scratch I'd 
go with Raymond's design. If it is statistically unjustified, its 
justified by other reasons, and if it is wrong, it's not so wrong as to 
be useless, and its wrong in a way that many other statistics libraries 
are also wrong. So we're in good company.

But I didn't know that *then*, and I never would have guessed that there 
was a non-statistical use for mode.

Lesson number one:

Just because you have thought about your function for five minutes, or 
five months, doesn't mean you have thought of all the real-world uses.

Lesson number two:

A lot of the Zen is intended as a joke, but the koan about refusing to 
guess is very good advice. When possible, be conservative, take your 
time to make a decision, and base it on real-world experience, not gut 
feelings about what is "obviously" correct.

In language design even more than personal code, You Ain't Gonna Need It 
(Yet) applies.

Lesson number three:

Sometimes, to not make a decision is itself a decision. In the case of 
mode, I had to deal with multiple modes *somehow*, I couldn't just 
ignore it. Fortunately I chose to raise an exception, which made it 
possible to change my mind later without a lengthy deprecation period. 
But that in turn made the function more annoying and difficult to use in 
practice.

But in the case of the proposed str.cut_prefix and cut_suffix methods, 
we can avoid the decision of what to do with multiple affixes by just 
not supporting them! We don't have to make a decision to raise an 
exception, or return X (for whatever semantics of X we choose). There's 
no need to choose *anything* about the multiple affix case until we have 
more real-world experience to make a judgement.

Lesson number four:

Python is nearly 30 years old, and the str.replace() method still 
refuses to guess how to deal with the case of multiple target strings. 
That doesn't make replace useless.

-- 
Steven