On Sep 21, 2014, at 17:30, Grant Jenks <grant.jenks@gmail.com> wrote:
Long time lurker, first time poster. I think there may be multiple discussions happening here so I wanted to highlight a competing module.
I think blist.blist is an excellent data type with a lot of value. But the performance graphs can be a bit misleading if you think they apply to the sortedlist, sorteddict, and sortedset types. In those scenarios, I do not believe blist is the best choice.
The SortedContainers module (https://pypi.python.org/pypi/sortedcontainers) provides SortedList, SortedDict, and SortedSet data types. It is implemented in pure-Python, has 100% coverage and hours of stress testing. The API implemented is very close to blist's and a lot of effort has been put into documentation (http://www.grantjenks.com/docs/sortedcontainers/). Furthermore, the data types provided are often faster than their blist counterparts.
Honestly, this is exactly why I think, despite what I've said in the past, maybe we don't need SortedDict, etc. in the stdlib. The API is something nontrivial, but arguably with a single right answer (and half the implementations I've seen on PyPI get either SortedSet or SortedList wrong), so that makes perfect sense to belong in the stdlib, in collections.abc. But the implementation, there are multiple right answers: B-tree, binary tree, skip list, array that sorts after every insert or before every lookup after an insert, same plus hash... Which one is "best" depends entirely on your data and how you intend to use them. The fact most other languages out there go with red-black trees, but it's hard to find a Python application where those actually perform best, seems like further argument against picking a best solution. Also, making people glance at the docs and see that they have to "pip install SortedCollections" seems like just enough hurdle to slow down the kind of people who have no idea why they're using it beyond "I saw some Java code that used a sorted map", without being a serious roadblock for anyone who actually does need it. So, unless Nick's stdlib++ idea is shot down, I think the best thing to do would be: Add the ABCs, then link to a few different libraries on PyPI--blist, SortedCollections, bintrees, the one I forget the name of that wraps up the sort-an-OrderedDict-on-the-fly recipe that the docs already link to, while encouraging the authors of those packages to implement the APIs and to provide wheels if they have any extension modules.