[Python-ideas] Sorted lists

Steven D'Aprano steve at pearwood.info
Mon Apr 8 06:01:58 EDT 2019


On Mon, Apr 08, 2019 at 10:34:19AM +0100, Paul Moore wrote:
> On Mon, 8 Apr 2019 at 10:10, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > Possibly the maintainer of bisect may decide that its not worth the
> > change. But for the statistics module, I would certainly change the
> > implementation of median() to look something vaguely like this:
> >
> >     # was
> >     data = sorted(data)  # may be expensive if data is large
> >
> >     # may become
> >     if not (isinstance(data, list) and data.__issorted__):
> >         data = sorted(data)
> 
> So just to be clear, this would be a change in behaviour - for a list
> that is currently sorted on a key that is *not* "numerically
> ascending", the code will no longer re-sort, and so will give the
> wrong answer?

Correct.

A thought comes to mind...

Currently the two variant forms of median (median_low and _high) can 
work with non-numeric data:

# don't take this example too seriously
py> statistics.median_low(['low', 'high', 'high', 'low', 'very low'])
'low'

so perhaps there is a use-case for non-numeric sorts.



> I'm not saying that this is forbidden, just want to be clear if that's
> what you mean (because my difficulty with the proposed attribute is
> that it seems unreliable enough that I can't imagine a case where I'd
> feel comfortable using it myself).

Fair enough, but does it help you feel a bit better about the feature if 
we called it a "sort hint", and emphasized that it should only be used 
in the same sort of APIs where you might allow the caller to pass

    already_sorted=True

to skip a redundant sort step?


> PS I thought timsort was highly efficient given already sorted data?
> Whether it's OK to rely on that I don't know, but did you take that
> into account?

Yes, timsort is very efficient with already sorted data, and I have :-)

On my computer, to sort 50 million integers in place takes about 13.5 
seconds; to sort it the next time takes 2.7 seconds. (That's effectively 
just galloping along the list, looking for out-of-place items and not 
finding any.)

To anyone tempted to say "Get a better computer", if I had a better
computer I'd be using a list of 50 billion integers and it would still 
take 2.7 seconds :-)



-- 
Steven


More information about the Python-ideas mailing list