Efficient Rank Ordering of Nested Lists

Sat Aug 4 14:07:59 EDT 2007

On Aug 3, 8:38 am, al... at mac.com (Alex Martelli) wrote:
> pablo.mitch... at gmail.com <pablo.mitch... at gmail.com> wrote:
> > A naive approach to rank ordering (handling ties as well) of nested
> > lists may be accomplished via:
>
> >    def rankLists(nestedList):
> >       def rankList(singleList):
> >           sortedList = list(singleList)
> >           sortedList.sort()
> >           return map(sortedList.index, singleList)
> >       return map(rankList, nestedList)
>
> >    >>> unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2,
> > 0, -1.1, 13 ] ]
> >    >>> print rankLists(unranked)
>
> >    [[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]
>
> > This works nicely when the dimensions of the nested list are small.
> > It is slow when they are big.  Can someone suggest a clever way to
> > speed it up?
>
> Each use of sortedList.index is O(N) [where N is len(singleList)], and
> you have N such uses in the map in the inner function, so this approach
> is O(N squared).  Neil's suggestion to use bisect replaces the O(N)
> .index with an O(log N) search, so the overall performance is O(N log N)
> [[and you can't do better than that, big-O wise, because the sort step
> is also O(N log N)]].
>
> "beginner"'s advice to use a dictionary is also good and may turn out to
> be faster, just because dicts are SO fast in Python -- but you need to
> try and measure both alternatives.  One way to use a dict (warning,
> untested code):
>
>   def rankList(singleList):
>       d = {}
>       for i, item in reversed(enumerate(sorted(singleList))):
>           d[item] = i
>       return [d[item] for item in singleList]
>
> If you find the for-clause too rich in functionality, you can of course
> split it up a bit; but note that you do need the 'reversed' to deal with
> the corner case of duplicate items (without it, you'd end up with 1s
> instead of 0s for the third one of the sublists in your example).  If
> speed is of the essence you might try to measure what happens if you
> replace the returned expression with map(d.__getitem__, singleList), but
> I suspect the list comprehension is faster as well as clearer.
>
> Another potential small speedup is to replace the first 3 statements
> with just one:
>
> d = dict((item,i) for i,item in reversed(enumerate(sorted(singleList))))
>
> but THIS density of functionality is a bit above my personal threshold
> of comfort ("sparse is better than dense":-).
>
> Alex

Thanks for breaking it down so thoroughly.  I try the different
recipes and see which gives the best trade offs between readability
and performance.  Agreed that dict() approach looks promising.