Feature request: sorting a list slice
Heiko Wundram
me+python at modelnine.org
Thu May 18 18:51:45 EDT 2006
Am Donnerstag 18 Mai 2006 22:13 schrieb Raymond Hettinger:
> This is a false optimization. The slicing steps are O(n) and the sort
> step is O(n log n) unless the data has some internal structure that
> Timsort can use to get closer to O(n).
>
> If you implemented this and timed in it real apps, I would be surprised
> to find that the slice extraction and assignments were the performance
> bottleneck. IMO, a worthwhile performance gain is unlikely.
I personally wouldn't find this to be a false optimization, at least not
memory-wise. If you sort really large lists, and only want to sort part of
the list, the memory gains of not having to do a slice should be enormous,
and there should be some time-gains too. And, additionally, implementing this
(if I understand timsort correctly, just had a look at the sources) isn't
hard.
Rather, I'd pose the usage question: why are you only sorting a part of a
list? lists are basically meant for homogenous data. And sorting only a part
of that should be a pretty meaningless operation, mostly...
Anyway, I've just written up a patch to extend sort() with a start/stop
parameter (which behaves just like the default slice notation). Generally,
this will make sort() setup slightly slower in the "standard" case (because
there's some more pointer arithmetic involved, but this should be negligible,
and is O(1)), but for the actual use case, the following numbers can be seen:
modelnine at phoenix ~/mercurial/python-modelnine $ ./python test.py
New average time: 13.7254650593 ms per loop
Old average time: 14.839854002 ms per loop
[10198 refs]
modelnine at phoenix ~/mercurial/python-modelnine $
This is just a very, very simple test (testing the case of a list with one
million entries, where 99000 are sorted):
>>>
from random import randrange
from time import time
x = [randrange(256) for x in xrange(1000000)]
timesnew, timesold = [], []
for reps in range(1000):
y = x[:]
timesnew.append(time())
y.sort(start=1000,stop=10000)
timesnew[-1] = time() - timesnew[-1]
for reps in range(1000):
y = x[:]
timesold.append(time())
y[1000:10000] = sorted(y[1000:10000])
timesold[-1] = time() - timesold[-1]
print "New average time:", sum(timesnew), "ms per loop"
print "Old average time:", sum(timesold), "ms per loop"
>>>
I'll post the patch to the SF bugtracker tomorrow, it's too late today for me
to review it, but generally, I wouldn't find it to be a bad addition, if
there's actually a general use case to only sort parts of a list.
--- Heiko.
More information about the Python-list
mailing list