Efficient Rank Ordering of Nested Lists
pablo.mitchell at gmail.com
pablo.mitchell at gmail.com
Sat Aug 4 14:07:59 EDT 2007
On Aug 3, 8:38 am, al... at mac.com (Alex Martelli) wrote:
> pablo.mitch... at gmail.com <pablo.mitch... at gmail.com> wrote:
> > A naive approach to rank ordering (handling ties as well) of nested
> > lists may be accomplished via:
>
> > def rankLists(nestedList):
> > def rankList(singleList):
> > sortedList = list(singleList)
> > sortedList.sort()
> > return map(sortedList.index, singleList)
> > return map(rankList, nestedList)
>
> > >>> unranked = [ [ 1, 2, 3, 4, 5 ], [ 3, 1, 5, 2, 4 ], [ -1.1, 2.2,
> > 0, -1.1, 13 ] ]
> > >>> print rankLists(unranked)
>
> > [[0, 1, 2, 3, 4], [2, 0, 4, 1, 3], [0, 3, 2, 0, 4]]
>
> > This works nicely when the dimensions of the nested list are small.
> > It is slow when they are big. Can someone suggest a clever way to
> > speed it up?
>
> Each use of sortedList.index is O(N) [where N is len(singleList)], and
> you have N such uses in the map in the inner function, so this approach
> is O(N squared). Neil's suggestion to use bisect replaces the O(N)
> .index with an O(log N) search, so the overall performance is O(N log N)
> [[and you can't do better than that, big-O wise, because the sort step
> is also O(N log N)]].
>
> "beginner"'s advice to use a dictionary is also good and may turn out to
> be faster, just because dicts are SO fast in Python -- but you need to
> try and measure both alternatives. One way to use a dict (warning,
> untested code):
>
> def rankList(singleList):
> d = {}
> for i, item in reversed(enumerate(sorted(singleList))):
> d[item] = i
> return [d[item] for item in singleList]
>
> If you find the for-clause too rich in functionality, you can of course
> split it up a bit; but note that you do need the 'reversed' to deal with
> the corner case of duplicate items (without it, you'd end up with 1s
> instead of 0s for the third one of the sublists in your example). If
> speed is of the essence you might try to measure what happens if you
> replace the returned expression with map(d.__getitem__, singleList), but
> I suspect the list comprehension is faster as well as clearer.
>
> Another potential small speedup is to replace the first 3 statements
> with just one:
>
> d = dict((item,i) for i,item in reversed(enumerate(sorted(singleList))))
>
> but THIS density of functionality is a bit above my personal threshold
> of comfort ("sparse is better than dense":-).
>
> Alex
Thanks for breaking it down so thoroughly. I try the different
recipes and see which gives the best trade offs between readability
and performance. Agreed that dict() approach looks promising.
More information about the Python-list
mailing list