Sorting Lists
phawkins at connact.com
phawkins at connact.com
Wed Aug 1 00:32:49 EDT 2001
>>>>> "TB" == Tom Bryan <tbryan at python.net> writes:
TB> phawkins at connact.com wrote:
TB> Of course, I never claimed that it was faster. One simply avoids
TB> copying the list and sorts in place rather than returning a sorted
TB> copy of the list. :-)
TB> By the way, would you care to post the results of your performances
TB> tests. Is it faster for lists of all sizes? How does the speed difference
TB> change as the size of the list changs?
Looks pretty stable for the sizes of lists that I tested -- running it
several different times for all sizes of lists, it looks as if speed
differences are based on random data differences -- but I haven't yet
seen what it does when it thrashes. I'll come back and post when I have.
TB> you-can't-get-away-that-easily-ly yours
TB> ---Tom
Well, I would have, but I'd done it in IDLE some time ago when I first
saw discussion of decorate-sort-undecorate. But, in the interests of
truth and justice, here's a new version.
I designed my random-data-generator so as to create data fairly
similar to the original example in this thread, with the result that
the get-sick&tired-waiting-for-it limiting factor is the data
generation. So if you want to play around with it, so as to see what
happens when the system starts thrashing, you might want to simplify
the data generation.
This is Windows 98, PII 450, 256 meg ram (and with several other programs
running).
------test_decorate.py-----------
from whrandom import randint
from string import letters
from time import clock
def make_data(length):
return [[make_name(), make_name(), randint(0, 100)] for i in xrange(length)]
def make_name():
word = ""
length = randint(3,10)
while len(word) < length:
word = word + letters[randint(0,25)]
return word.capitalize()
def dec_sort_by_field(seq,index):
tmp = [ (item[index],item) for item in seq ]
tmp.sort()
return [ item for (_,item) in tmp ]
def cmp_sort_by_field( index ):
def cmpfunc( x, y, i=index ):
return cmp( x[i], y[i] )
return cmpfunc
def time_cmp(data):
start = clock()
data.sort(cmp_sort_by_field(1))
end = clock()
return end - start
def time_dec(data):
start = clock()
dec_sort_by_field(data, 1)
end = clock()
return end - start
for i in [10,100,200,300,400,500,1000,5000,10000,50000,75000,100000,150000,\
200000, 250000, 300000]:
data = make_data(i)
dec = time_dec(data)
cp = time_cmp(data)
print "data length: %d" % i
print "decorate: %f" % dec
print "cmp: %f" % cp
if dec:
print "cmp/dec: %f\n" % (cp/dec)
-------------------end------------------
ran it a few times in emacs.... never did max memory.
Python 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
>>> ## working on region in file c:/WINDOWS/TEMP/python--722007F4D...
data length: 10
decorate: 0.000138
cmp: 0.000373
cmp/dec: 2.696970
data length: 100
decorate: 0.001664
cmp: 0.004707
cmp/dec: 2.829219
data length: 200
decorate: 0.002791
cmp: 0.062291
cmp/dec: 22.319520
data length: 300
decorate: 0.004592
cmp: 0.019297
cmp/dec: 4.202409
data length: 400
decorate: 0.006918
cmp: 0.028260
cmp/dec: 4.084676
data length: 500
decorate: 0.007772
cmp: 0.036634
cmp/dec: 4.713793
data length: 1000
decorate: 0.018267
cmp: 0.083722
cmp/dec: 4.583226
data length: 5000
decorate: 0.132274
cmp: 0.539986
cmp/dec: 4.082318
data length: 10000
decorate: 0.350746
cmp: 1.232022
cmp/dec: 3.512577
data length: 50000
decorate: 1.971823
cmp: 7.555448
cmp/dec: 3.831707
data length: 75000
decorate: 3.181551
cmp: 11.692751
cmp/dec: 3.675173
data length: 100000
decorate: 4.401343
cmp: 16.481489
cmp/dec: 3.744650
data length: 150000
decorate: 7.256650
cmp: 25.021690
cmp/dec: 3.448105
>>> ## working on region in file c:/WINDOWS/TEMP/python--722007SCK...
data length: 10
decorate: 0.000184
cmp: 0.000225
cmp/dec: 1.218182
data length: 100
decorate: 0.001300
cmp: 0.005864
cmp/dec: 4.511283
data length: 200
decorate: 0.002601
cmp: 0.011922
cmp/dec: 4.584273
data length: 300
decorate: 0.004932
cmp: 0.019423
cmp/dec: 3.937978
data length: 400
decorate: 0.008885
cmp: 0.027430
cmp/dec: 3.087059
data length: 500
decorate: 0.008477
cmp: 0.036760
cmp/dec: 4.336662
data length: 1000
decorate: 0.020899
cmp: 0.082618
cmp/dec: 3.953240
data length: 5000
decorate: 0.151913
cmp: 0.563729
cmp/dec: 3.710857
data length: 10000
decorate: 0.304589
cmp: 1.485802
cmp/dec: 4.878048
data length: 50000
decorate: 1.758838
cmp: 7.936150
cmp/dec: 4.512156
data length: 75000
decorate: 3.160845
cmp: 12.241860
cmp/dec: 3.872971
data length: 100000
decorate: 4.147288
cmp: 15.488422
cmp/dec: 3.734590
data length: 150000
decorate: 6.621114
cmp: 24.302384
cmp/dec: 3.670437
>>> ## working on region in file c:/WINDOWS/TEMP/python--722007fMQ...
data length: 10
decorate: 0.000186
cmp: 0.000357
cmp/dec: 1.918919
data length: 100
decorate: 0.001405
cmp: 0.004687
cmp/dec: 3.337112
data length: 200
decorate: 0.002718
cmp: 0.011958
cmp/dec: 4.399630
data length: 300
decorate: 0.005749
cmp: 0.019170
cmp/dec: 3.334743
data length: 400
decorate: 0.007392
cmp: 0.028275
cmp/dec: 3.825057
data length: 500
decorate: 0.008398
cmp: 0.038744
cmp/dec: 4.613573
data length: 1000
decorate: 0.020952
cmp: 0.083020
cmp/dec: 3.962478
data length: 5000
decorate: 0.152054
cmp: 0.588490
cmp/dec: 3.870263
data length: 10000
decorate: 0.559918
cmp: 1.350087
cmp/dec: 2.411223
data length: 50000
decorate: 2.027378
cmp: 7.347429
cmp/dec: 3.624104
data length: 75000
decorate: 3.062071
cmp: 11.395732
cmp/dec: 3.721576
data length: 100000
decorate: 4.131656
cmp: 15.764242
cmp/dec: 3.815478
data length: 150000
decorate: 7.114213
cmp: 24.034858
cmp/dec: 3.378428
data length: 200000
decorate: 9.644709
cmp: 33.191710
cmp/dec: 3.441442
data length: 250000
decorate: 17.852266
cmp: 50.232261
cmp/dec: 2.813775
data length: 300000
decorate: 20.336841
cmp: 75.334111
cmp/dec: 3.704317
>>>
--
Patricia J. Hawkins
Hawkins Internet Applications, LLC
More information about the Python-list
mailing list