[Python-ideas] collections.sortedset proposal
Paul Colomiets
paul at colomiets.name
Tue Dec 25 23:24:22 CET 2012
Hi,
I want to propose to include SortedSet data structure into collections module.
SortedSet (name borrowed from Redis) is a basically a mapping of
(unique) keys to scores, that allows fast slicing by ordinal number
and by score.
There are plenty of use cases for the sorted sets:
* Leaderboard for a game
* Priority queue (that supports task deletion)
* Timer list (e.g. can be used for tulip, supports deletion too)
* Caches with TTL-based, LFU or LRU eviction (including `functools.lru_cache`)
* Search databases with relevance scores
* Statistics (many use cases)
* Replacement for `collections.Counter` with faster `most_common()`
I have first draft of pure python implementation:
https://github.com/tailhook/sortedsets
http://pypi.python.org/pypi/sortedsets/1.0
The implementation is closely modeled on Redis. Internally it consists
of a dict for mapping between keys and scores, and a skiplist for
scores. So most operations are done with O(log n) time. The actual
performance is probably very slow for pure-python implementation, but
can be fixed by C code later. The asymptotic performance seems to be
OK.
So my questions are:
1. Do you think SortedSets are eligible for inclusion to stdlib?
2. Do I need a PEP?
3. Any comments on the implementation?
P.S.: Sorted sets in redis are not the same thing as sorted sets in
blist. So maybe a better name?
--
Paul
More information about the Python-ideas
mailing list