[Python-ideas] Sorted lists
Steven D'Aprano
steve at pearwood.info
Sun Apr 7 22:32:05 EDT 2019
There are quite a few important algorithms which require lists to be
sorted. For example, the bisect module, and for statistics median and
other quantiles.
Sorting a list is potentially expensive: while Timsort is very
efficient, it is still ultimately an O(N log N) algorithm which limits
how efficient it can be. Checking whether a list is sorted is O(N).
What if we could check that lists were sorted in constant time?
Proposal: let's give lists a dunder flag, __issorted__, that tracks
whether the list is definitely sorted or not:
- Empty lists, or lists with a single item, are created with
__issorted__ = True; lists with two or more items are created
with the flag set to False.
- Appending or inserting items sets the flag to False.
- Deleting or popping items doesn't change the flag.
- Reversing the list doesn't change the flag.
- Sorting it sets the flag to True. (The sort method should NOT
assume the list is sorted just because the flag is set.)
Functions that require the list to be sorted can use the flag as a
quick check:
if not alist.__issorted__:
alist.sort()
...
The flag will be writable, so that functions such as those in
bisect can mark that they have kept the sorted invariant:
bisect.insort(alist, x)
assert alist.__issorted__
Being writable, the flag is advisory, not a guarantee, and "consenting
adults" applies. People can misuse the flag:
alist = [1, 4, 2, 0, 5]
alist.__issorted__ = True
but then they have nobody to blame but themselves if they shoot
themselves in the foot. That's no worse than the situation we have now,
were you might pass an unsorted list to bisect.
The flag doesn't guarantee that the list is sorted the way you want
(e.g. biggest to smallest, by some key, etc) only that it has been
sorted. Its up to the user to ensure they sort it the right way:
# Don't do this and expect it to work!
alist.sort(key=random.random)
bisect.insort(alist, 1)
If you really want to be sure about the state of the list, you have to
make a copy and sort it. But that's no different from the situation
right now. But for those willing to assume "consenting adults", you
might trust the flag and avoid sorting.
Downsides:
- Every list grows an extra attribute; however, given that lists are
already quite big data structures and are often over-allocated, I
don't think this will matter much.
- insert(), append(), extend(), __setitem__() will be a tiny bit
slower due to the need to set the flag.
Thoughts?
--
Steven
More information about the Python-ideas
mailing list