Ordering python sets
Steven D'Aprano
steve at REMOVE-THIS-cybersource.com.au
Tue Oct 28 09:40:41 EDT 2008
On Mon, 27 Oct 2008 17:03:58 -0700, Glenn Linderman wrote:
>>> A little harder question is how to create a key that corresponds to
>>> ascending string followed by descending string?
>>>
>>> To do that you can sort the data two times, relying on the stable
>>> nature of the Python sort.
>>>
>>>
> Ick. Costs twice as much.
But that's only a constant cost. For small amounts of data, it's trivial,
and for large amounts, it's dominated by the cost of sorting N items,
which is O(N log N) by memory. (Python's sort is a modified heap sort, I
believe.)
[...]
> This solution seems to be the solution I was looking for. I think "Neg"
> is a poor name, but all the good names are significantly longer... Given
> its usage, though, in a sorted call, "Rev" would be a better poor name!
> ReversedCompare might be a better long name.
>
> How can I suggest adding it to the documentation for Python 3.0 ? It is
> much preferable to the "obvious" solution of sorting twice, and doing
> that is a trap which many people might fall into. I posted here
> instead, and thank you for the solution.
Why do you call it a trap? You're preferred solution is significantly
slower for small amounts of data than the solution you call a trap:
class Neg(object):
def __init__(self, value):
self.value = value
def __cmp__(self, other):
return -cmp(self.value, other.value)
def sort_updown1(pairs):
return sorted(pairs, key=lambda (s, t): (s, Neg(t)))
import operator
def sort_updown2(pairs):
pairs = pairs[:]
pairs.sort(key=operator.itemgetter(1), reverse=True)
pairs.sort(key=operator.itemgetter(0))
return pairs
pairs = [('al', 'boo'), ('zul', 'ao'), ('al', 'den'), ('zul', 'o')]
from timeit import Timer
t1 = Timer("sort_updown1(pairs)",
"from __main__ import sort_updown1, pairs")
t2 = Timer("sort_updown2(pairs)",
"from __main__ import sort_updown2, pairs")
And the results:
>>> t1.repeat(number=1000)
[0.053807973861694336, 0.053076028823852539, 0.052966117858886719]
>>> t2.repeat(number=1000)
[0.022480964660644531, 0.022369861602783203, 0.02264404296875]
The difference is even more significant for larger amounts of data:
>>> pairs = pairs*100
>>> t1.repeat(number=1000)
[5.3583409786224365, 4.6985390186309814, 4.6953370571136475]
>>> t2.repeat(number=1000)
[1.1064291000366211, 1.0017910003662109, 0.48421788215637207]
And more significant still as we sort even more data:
>>> pairs = pairs*10
>>> t1.repeat(number=1000)
[53.255411863327026, 53.792617082595825, 53.549386024475098]
>>> t2.repeat(number=1000)
[5.8788919448852539, 5.0701780319213867, 4.8528299331665039]
Or, tabulating the results:
N = 4 400 4000
------------------------
t1 = 0.05 4.6 53.3
t2 = 0.02 0.5 4.9
As you can see, the supposed "trap" of sorting twice is significantly
faster, by an order of magnitude, than sorting it once. It would be even
faster if I didn't bother making a copy of the list before sorting.
--
Steven
More information about the Python-list
mailing list