[Python-ideas] collections.sortedset proposal

Paul Colomiets paul at colomiets.name
Tue Dec 25 23:24:22 CET 2012


Hi,

I want to propose to include SortedSet data structure into collections module.

SortedSet (name borrowed from Redis) is a basically a mapping of
(unique) keys to scores, that allows fast slicing by ordinal number
and by score.

There are plenty of use cases for the sorted sets:

* Leaderboard for a game
* Priority queue (that supports task deletion)
* Timer list (e.g. can be used for tulip, supports deletion too)
* Caches with TTL-based, LFU or LRU eviction (including `functools.lru_cache`)
* Search databases with relevance scores
* Statistics (many use cases)
* Replacement for `collections.Counter` with faster `most_common()`

I have first draft of pure python implementation:

https://github.com/tailhook/sortedsets
http://pypi.python.org/pypi/sortedsets/1.0

The implementation is closely modeled on Redis. Internally it consists
of a dict for mapping between keys and scores, and a skiplist for
scores. So most operations are done with O(log n) time. The actual
performance is probably very slow for pure-python implementation, but
can be fixed by C code later. The asymptotic performance seems to be
OK.

So my questions are:

1. Do you think SortedSets are eligible for inclusion to stdlib?
2. Do I need a PEP?
3. Any comments on the implementation?


P.S.: Sorted sets in redis are not the same thing as sorted sets in
blist. So maybe a better name?

--
Paul



More information about the Python-ideas mailing list