Long time lurker, first time poster. I think there may be multiple
discussions happening here so I wanted to highlight a competing module.
I think blist.blist is an excellent data type with a lot of value. But the
performance graphs can be a bit misleading if you think they apply to the
sortedlist, sorteddict, and sortedset types. In those scenarios, I do not
believe blist is the best choice.
The SortedContainers module (https://pypi.python.org/pypi/sortedcontainers)
provides SortedList, SortedDict, and SortedSet data types. It is
implemented in pure-Python, has 100% coverage and hours of stress testing.
The API implemented is very close to blist's and a lot of effort has been
put into documentation (http://www.grantjenks.com/docs/sortedcontainers/).
Furthermore, the data types provided are often faster than their blist
counterparts.
Extensive performance comparisons against other implementations of sorted
list, sorted dict, and sorted set types are documented (
http://www.grantjenks.com/docs/sortedcontainers/performance.html) along
with a comparison of runtimes and load-factors (similar to blist,
sortedcontainers uses a modified B-tree but with a tunable node size.) For
SortedList, an analysis of use cases on Github has been made. These
describe five use cases with performance analysis (
http://www.grantjenks.com/docs/sortedcontainers/performance-workload.html)
Disclaimer: I am the author of the Python SortedContainers project.
Feedback welcome.
Grant Jenks
On Sun, Sep 21, 2014 at 3:04 AM, Antoine Pitrou
On Sun, 21 Sep 2014 05:18:38 +0000 David Wilson
wrote: On Sun, Sep 21, 2014 at 02:37:39PM +1000, Steven D'Aprano wrote:
In 2007, this PEP was created that suggested integrating Daniel Stutzbach's blist into Python: http://legacy.python.org/dev/peps/pep-3128/
The PEP was rejected, but Raymond Hettinger made a note that "after a few months, I intend to poll comp.lang.python for BList success stories. If they exist, then I have no problem with inclusion in the collections module."
I have not used blist, but I have no objection to it becoming a collections type if the author agrees.
It seems unsettling that we'd consider adding another special use collection to the stdlib, when more widely applicable generalizations of it are missing. In blist's case, it can (mostly) be trivially reimplemented using an ordered map, which the standard library lacks.
But can it be *efficiently* reimplemented using an ordered map? blist uses chunks of 128 pointers, which makes it almost as memory efficient as a standard list. A traditional ordered map (I suppose you mean some kind of balanced tree) would generally show much more overhead, if only because it has to store the keys explicitly. And also, you now have to do costly key comparisons every time you do a lookup ("costly" because they go through the object layer and its indirections, as opposed to simple integer arithmetic).
Coming from this perspective, I'd prefer that further additions were limited to clean and far better understood structures. In this light, could we perhaps instead discuss the merits of a collections.Tree, collections.SortedDict or similar?
This sounds like a false dilemma. We could have a collections.blist *and* a collections.Tree.
Regards
Antoine.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/