[Python-ideas] Sorted lists
Steven D'Aprano
steve at pearwood.info
Mon Apr 8 06:25:30 EDT 2019
On Mon, Apr 08, 2019 at 02:55:54AM -0700, Nathaniel Smith wrote:
> Right, by "doesn't affect" I meant "cannot get any benefit, even if their
> code is modified".
Ah, sorry I misunderstood you.
> > # This only makes sense if data is a sequence (list)
> > # not an iterator.
> > quartiles = quantiles(data, n=4)
> > quintiles = quantiles(data, n=5)
> > deciles = quantiles(data, n=10)
> > percentiles = quantiles(data, n=100)
> >
>
> If only we had some kind of API that could compute multiple quantiles at
> the same time...
You mean something like this?
quartiles, quintiles, deciles, percentiles = quantiles(data, n=(4, 5, 10, 100))
Yuck :-)
I'd rather allow an already_sorted=True parameter :-)
> > I'm perfectly comfortable with allowing the caller to lie if they want.
> > Its their own foot they're shooting.
> >
>
> An already_sorted=True argument would be an explicit opt in, and consenting
> adults would apply. But your message was very explicit that __issorted__
> can be set implicitly, though. For example, this would give garbage results:
>
> # implicitly sets the sorted flag
> data.sort()
> # preserves the flag, because hey it's sorted by *some* key
> data.reverse()
> statistics.median(data)
It would certainly be putting more responsibility on the caller to
ensure the sorted flag was correct.
> You can't use this in statistics.median because it would break
> compatibility.
I would argue differently.
> Also, isn't the whole point of 'statistics' to be the
> simple, reliable module for folks who aren't that worried about speed? This
> seems like a massive footgun.
Perhaps. Perhaps not as massive as it seems.
The expected use-case would be something like this:
data = get_data()
a = median(data)
data.sort()
b = median(data) # Like a, but faster.
To be a problem, the caller would need to do something like:
data = get_data()
a = median(data)
data.sort(reversed=True) # or some weird key function
b = median(data) # Garbage result.
I can't see people shooting themselves in the foot in this way by
accident very often. But fair enough, it is a risk.
> > (I wouldn't be so blasé about this if it were a function written in C
> > that could segfault if the list wasn't sorted.)
> >
>
> Silently giving the wrong answer is way worse than a segfault.
It depends on what you're worried about, and who gets the blame for the
wrong answer.
As I understand it, it is the position of the core-devs that *any* seg
fault in the interpreter or stdlib is a serious bug that must be fixed
(possibly excepting ctypes); but if Python code returns garbage when you
give it garbage input, that may or may not be considered a bug.
In this case, passing a list with the flag set when it is not actually
sorted correctly would be a "Garbage In, Garbage Out" error, just as if
they had explicitly passed a already_sorted=True argument.
But I take your earlier point that the argument version is explicit and
opt-in, while the flag is implicit and may not be opt-in.
Given all the points already raised, I think that an explicit SortedList
might be more appropriate.
Thanks everyone for all the feedback.
--
Steven
More information about the Python-ideas
mailing list