[Python-ideas] Sorted lists
Steven D'Aprano
steve at pearwood.info
Mon Apr 8 06:01:58 EDT 2019
On Mon, Apr 08, 2019 at 10:34:19AM +0100, Paul Moore wrote:
> On Mon, 8 Apr 2019 at 10:10, Steven D'Aprano <steve at pearwood.info> wrote:
>
> > Possibly the maintainer of bisect may decide that its not worth the
> > change. But for the statistics module, I would certainly change the
> > implementation of median() to look something vaguely like this:
> >
> > # was
> > data = sorted(data) # may be expensive if data is large
> >
> > # may become
> > if not (isinstance(data, list) and data.__issorted__):
> > data = sorted(data)
>
> So just to be clear, this would be a change in behaviour - for a list
> that is currently sorted on a key that is *not* "numerically
> ascending", the code will no longer re-sort, and so will give the
> wrong answer?
Correct.
A thought comes to mind...
Currently the two variant forms of median (median_low and _high) can
work with non-numeric data:
# don't take this example too seriously
py> statistics.median_low(['low', 'high', 'high', 'low', 'very low'])
'low'
so perhaps there is a use-case for non-numeric sorts.
> I'm not saying that this is forbidden, just want to be clear if that's
> what you mean (because my difficulty with the proposed attribute is
> that it seems unreliable enough that I can't imagine a case where I'd
> feel comfortable using it myself).
Fair enough, but does it help you feel a bit better about the feature if
we called it a "sort hint", and emphasized that it should only be used
in the same sort of APIs where you might allow the caller to pass
already_sorted=True
to skip a redundant sort step?
> PS I thought timsort was highly efficient given already sorted data?
> Whether it's OK to rely on that I don't know, but did you take that
> into account?
Yes, timsort is very efficient with already sorted data, and I have :-)
On my computer, to sort 50 million integers in place takes about 13.5
seconds; to sort it the next time takes 2.7 seconds. (That's effectively
just galloping along the list, looking for out-of-place items and not
finding any.)
To anyone tempted to say "Get a better computer", if I had a better
computer I'd be using a list of 50 billion integers and it would still
take 2.7 seconds :-)
--
Steven
More information about the Python-ideas
mailing list