
On Thu, Nov 11, 2021 at 11:01:32AM -0800, Richard Levasseur wrote:
Should the stdlib have e.g. SortedList? Probably not, because the use cases of such data types are too niche to a one-size-fits-all implementation, and there are too many implementations with too many of their own settings. Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so, because those are somewhat fundamental data structures and implementing, testing, and verifying them is very much non-trivial. While niche, having them at your immediate disposal is very powerful.
By that reasoning, we shouldn't have dicts, for exactly the same reason: for anyone wanting an associative array, there are so many implementation variants to choose from:
- hash table with linear probing - hash table with chaining - AVL tree - red-black tree - judy array - splay tree - treap - scapegoat tree
and many more, and most of them can be tuned.
If Python didn't already have dict, your argument against SortedList would equally apply to it: they are "fundamental data structures and implementing, testing, and verifying them is very much non-trivial".
So if your argument is correct, that would imply that standardizing on one single dict implementation, one that isn't even tunable by the caller, was a mistake. We should have provided a dozen different hash tables, trees and treaps.
But... we didn't, and I don't think that Python is a worse language because we only have one associative array implementation in the stdlib.
Whenever I need an associative array, I don't lose sleep over whether I could get 2% better performance for hits, at a cost of 17% worse performance for misses, by swapping over to some other implementation. I just reach for dict, knowing that it will almost always be good enough.
Last year, for fun, after wishing there was a SortedSet in the standard lib, I ended up implementing a Red-Black Tree and BTree based sorted dictionary/set[1]. After then trying to use them for my use case[2], I found that, in order to fully and truly exploit their benefits, the basic Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that would let me, e.g. binary search to a particular spot and then iterate, or give me a range between two points, etc.
I believe that sortedcontainers.SortedSet provides that functionality via the irange() method.
http://www.grantjenks.com/docs/sortedcontainers/sortedlist.html#sortedcontai...