[Python-ideas] Putting `blist` into collections module
Mark Lawrence
breamoreboy at yahoo.co.uk
Tue Sep 23 02:32:31 CEST 2014
On 22/09/2014 02:32, Andrew Barnert wrote:
> On Sep 21, 2014, at 17:30, Grant Jenks
> <grant.jenks at gmail.com
> <mailto:grant.jenks at gmail.com>> wrote:
>
>> Long time lurker, first time poster. I think there may be multiple
>> discussions happening here so I wanted to highlight a competing module.
>>
>> I think blist.blist is an excellent data type with a lot of value. But
>> the performance graphs can be a bit misleading if you think they apply
>> to the sortedlist, sorteddict, and sortedset types. In those
>> scenarios, I do not believe blist is the best choice.
>>
>> The SortedContainers module
>> (https://pypi.python.org/pypi/sortedcontainers) provides SortedList,
>> SortedDict, and SortedSet data types. It is implemented in
>> pure-Python, has 100% coverage and hours of stress testing. The API
>> implemented is very close to blist's and a lot of effort has been put
>> into documentation (http://www.grantjenks.com/docs/sortedcontainers/).
>> Furthermore, the data types provided are often faster than their blist
>> counterparts.
>
> Honestly, this is exactly why I think, despite what I've said in the
> past, maybe we don't need SortedDict, etc. in the stdlib.
>
> The API is something nontrivial, but arguably with a single right answer
> (and half the implementations I've seen on PyPI get either SortedSet or
> SortedList wrong), so that makes perfect sense to belong in the stdlib,
> in collections.abc.
>
> But the implementation, there are multiple right answers: B-tree, binary
> tree, skip list, array that sorts after every insert or before every
> lookup after an insert, same plus hash... Which one is "best" depends
> entirely on your data and how you intend to use them. The fact most
> other languages out there go with red-black trees, but it's hard to find
> a Python application where those actually perform best, seems like
> further argument against picking a best solution.
>
> Also, making people glance at the docs and see that they have to "pip
> install SortedCollections" seems like just enough hurdle to slow down
> the kind of people who have no idea why they're using it beyond "I saw
> some Java code that used a sorted map", without being a serious
> roadblock for anyone who actually does need it.
>
> So, unless Nick's stdlib++ idea is shot down, I think the best thing to
> do would be: Add the ABCs, then link to a few different libraries on
> PyPI--blist, SortedCollections, bintrees, the one I forget the name of
> that wraps up the sort-an-OrderedDict-on-the-fly recipe that the docs
> already link to, while encouraging the authors of those packages to
> implement the APIs and to provide wheels if they have any extension modules.
>
There's all sorts of interesting things discussed here
http://kmike.ru/python-data-structures/ so if you didn't know about it
you do now :)
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawrence
More information about the Python-ideas
mailing list