[Python-ideas] collections.sortedset proposal

Terry Reedy tjreedy at udel.edu
Wed Dec 26 08:58:00 CET 2012


On 12/25/2012 5:24 PM, Paul Colomiets wrote:
> Hi,
>
> I want to propose to include SortedSet data structure into collections module.
>
> SortedSet (name borrowed from Redis) is a basically a mapping of
> (unique) keys to scores, that allows fast slicing by ordinal number
> and by score.

Since a set, in general, is not a mapping, I do not understand what you 
mean. If you mean a mapping from sorted position to item, then I would 
call it a sortedlist.

> There are plenty of use cases for the sorted sets:
>
> * Leaderboard for a game

This looks like an auto-sorted list.

> * Priority queue (that supports task deletion)

This looks like something else.

> * Timer list (e.g. can be used for tulip, supports deletion too)
> * Caches with TTL-based, LFU or LRU eviction (including `functools.lru_cache`)

These look like sorted lists.

> * Search databases with relevance scores
> * Statistics (many use cases)

These are rather vague.

> * Replacement for `collections.Counter` with faster `most_common()`

This looks like something else.

> I have first draft of pure python implementation:
>
> https://github.com/tailhook/sortedsets
> http://pypi.python.org/pypi/sortedsets/1.0
>
> The implementation is closely modeled on Redis. Internally it consists
> of a dict for mapping between keys and scores, and a skiplist for
> scores. So most operations are done with O(log n) time. The actual
> performance is probably very slow for pure-python implementation, but
> can be fixed by C code later. The asymptotic performance seems to be
> OK.
>
> So my questions are:
>
> 1. Do you think SortedSets are eligible for inclusion to stdlib?
> 2. Do I need a PEP?
> 3. Any comments on the implementation?

The standard answer is to list on or submit to pypi and get community 
approval and adoption. Then a pep with a commitment to maintenance even 
while others interfere with your 'baby'. Long-time core committers 
sometimes get to cut the process short, but even Guido is starting his 
propose async module/package with a pep and publicly available code for 3.3.

-- 
Terry Jan Reedy




More information about the Python-ideas mailing list