On Thu, Nov 11, 2021 at 4:30 AM Paul Moore <p.f.moore@gmail.com> wrote:
On Thu, 11 Nov 2021 at 11:51, Antoine Pitrou <antoine@python.org> wrote:
> On Wed, 10 Nov 2021 21:12:17 -0600
> Tim Peters <tim.peters@gmail.com> wrote:
> > [Bob Fang <boyufang@bytedance.com>]
> > > This is a modest proposal to consider having sorted containers
> > > (http://www.grantjenks.com/docs/sortedcontainers/) in standard library.
> >
> > +1 from me, but if and only if Grant Jenks (its author) wants that too.
> >
> > It's first-rate code in all respects, including that it's a fine
> > example _of_ Python programming (it's not written in C - in Python).
> Agreed with Tim.  This is a perfect example of some basic and perennial
> facility that would fit very well in the stdlib.

I agree as well. Is anyone interested enough to ask the library author
if he supports doing this? That seems to be the main unanswered
question here.

But if anyone wants to argue the "the stdlib should be shrinking, not
growing" position, I suggest they do so *before* someone reaches out
to the module author. No point in us making the suggestion and then
being forced to withdraw it.

A couple 2 cents and IMHO.

The tldr is not so much "don't put it in the standard lib", but that the stdlib would be better having particular implementations instead of a one-size-fits-all e.g. SortedList.

Should the stdlib have e.g. SortedList? Probably not, because the use cases of such data types are too niche to a one-size-fits-all implementation, and there are too many implementations with too many of their own settings.
Should it have e.g. BTree, RedBlackTree, SortedArrayList, etc? Probably so, because those are somewhat fundamental data structures and implementing, testing, and verifying them is very much non-trivial. While niche, having them at your immediate disposal is very powerful.

Last year, for fun, after wishing there was a SortedSet in the standard lib, I ended up implementing a Red-Black Tree and BTree based sorted dictionary/set[1]. After then trying to use them for my use case[2], I found that, in order to fully and truly exploit their benefits, the basic Sequence/Collection/Set/Dict APIs didn't really suffice. I needed APIs that would let me, e.g. binary search to a particular spot and then iterate, or give me a range between two points, etc. Such apis tend to be pretty specialized to exploit their under data structure (which isn't to say an abstract API can't be created, see Java's NavigableMap, but that level of abstraction deserves careful consideration).

In the end, it was a fun exercise, but in practice a dictionary and sorted() got me 90% of the way there and sufficed. Optimizing that last 10% wasn't worth the effort.

Anyways, I came to two particular, IMHO, conclusions:
1. Sorted collections are very niche. I could have used one, but probably would have spent as much time fiddling with them as using a dict/sorted().
2. If you're in a niche use case, especially for performance, then abstractions aren't really helpful. Using e.g. a BTree and knowing the particulars of the implementation are going to be more helpful than having an abstract SortedList and working out how to use its special APIs for particular actions.

This basically echos Christopher Baker's sentiment of "whats the use case" and "why doesn't a sorted() call at the end suffice" to some extent. IME, Python's sort algorithm is just really good, so it's hard to beat simply calling sorted() when you're done processing in both runtime performance and effort-spent-optimizing. I struggle to think of or remember a particular case where dropping in a sorted collection would be a clear and obvious win. Maybe a more memory constrained case where sorted() creating a new list is too much? IDK.

[1] As an aside, RB didn't perform nearly as well as sortedcontainers for the majority of cases, but the naive BTree was about on par. IIRC, sortedcollections is basically a specialized BTree.
[2] A wish that came about because I had sorted sets of exact/prefix strings of 1,000 to 10,000 elements and needed them to interact in various ways and wanted to preserve their sortedness.


re: "Is this particular one clearly the “best”[*]?"

Performance wise, it's probably about the best insofar as a BTree of depth=2 and high fanout (10,000 or something? I forget what sortedcontainers defaults to) is. In my limited experience, it's docs and claims about its performance were pretty accurate[3].

API wise, it has its share of idiosyncrasies like anything else. e.g. SortedList claims to implement MutableSequence, but actually raises errors when calling e.g. append(), which is a bit misleading / wrong to some extent (to be fair, given MutableSequence's api contract, it's not like there's particularly clear/right/good choice one can make here).

(disclaimer: i've minimal experience with sortedcontainers; when I looked at it, it was really only for comparing my own toy BTree/RBT behavior/performance with a known implementation)

[3] The exception being a claim that its (supposedly) better cache locality is a significant factor. I think the underlying BTree algorithm is the dominant factor. But someone who actually knows how to measure cache locality would be a better judge of that than me, who doesn't know how.

Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-leave@python.org
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5SURNB4C5FGJ6LSXUPVW2EFP22ERKSGB/
Code of Conduct: http://python.org/psf/codeofconduct/