Sorting Lists

phawkins at connact.com phawkins at connact.com
Wed Aug 1 00:32:49 EDT 2001


>>>>> "TB" == Tom Bryan <tbryan at python.net> writes:

TB> phawkins at connact.com wrote:

TB> Of course, I never claimed that it was faster.  One simply avoids 
TB> copying the list and sorts in place rather than returning a sorted 
TB> copy of the list. :-)

TB> By the way, would you care to post the results of your performances 
TB> tests.  Is it faster for lists of all sizes?  How does the speed difference 
TB> change as the size of the list changs?  

Looks pretty stable for the sizes of lists that I tested -- running it
several different times for all sizes of lists, it looks as if speed
differences are based on random data differences -- but I haven't yet
seen what it does when it thrashes.  I'll come back and post when I have.

TB> you-can't-get-away-that-easily-ly yours
TB> ---Tom

Well, I would have, but I'd done it in IDLE some time ago when I first
saw discussion of decorate-sort-undecorate.  But, in the interests of
truth and justice, here's a new version.

I designed my random-data-generator so as to create data fairly
similar to the original example in this thread, with the result that
the get-sick&tired-waiting-for-it limiting factor is the data
generation.  So if you want to play around with it, so as to see what
happens when the system starts thrashing, you might want to simplify
the data generation.

This is Windows 98, PII 450, 256 meg ram (and with several other programs
running). 

------test_decorate.py-----------
from whrandom import randint
from string import letters
from time import clock

def make_data(length):
    return [[make_name(), make_name(), randint(0, 100)] for i in xrange(length)]

def make_name():
    word = ""
    length = randint(3,10)
    while len(word) < length:
        word = word + letters[randint(0,25)]
    return word.capitalize()

def dec_sort_by_field(seq,index):
     tmp = [ (item[index],item) for item in seq ]
     tmp.sort()
     return [ item for (_,item) in tmp ]

def cmp_sort_by_field( index ):
    def cmpfunc( x, y, i=index ):
        return cmp( x[i], y[i] )
    return cmpfunc

def time_cmp(data):
    start = clock()
    data.sort(cmp_sort_by_field(1))
    end = clock()
    return end - start  

def time_dec(data):
    start = clock()
    dec_sort_by_field(data, 1)
    end = clock()
    return end - start
    
for i in [10,100,200,300,400,500,1000,5000,10000,50000,75000,100000,150000,\
          200000, 250000, 300000]:
    data = make_data(i)
    dec = time_dec(data)
    cp = time_cmp(data)
    print "data length: %d" % i
    print "decorate: %f" % dec
    print "cmp:      %f" % cp
    if dec:
        print "cmp/dec:  %f\n" % (cp/dec)
-------------------end------------------

ran it a few times in emacs.... never did max memory.


Python 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
>>> ## working on region in file c:/WINDOWS/TEMP/python--722007F4D...
data length: 10
decorate: 0.000138
cmp:      0.000373
cmp/dec:  2.696970

data length: 100
decorate: 0.001664
cmp:      0.004707
cmp/dec:  2.829219

data length: 200
decorate: 0.002791
cmp:      0.062291
cmp/dec:  22.319520

data length: 300
decorate: 0.004592
cmp:      0.019297
cmp/dec:  4.202409

data length: 400
decorate: 0.006918
cmp:      0.028260
cmp/dec:  4.084676

data length: 500
decorate: 0.007772
cmp:      0.036634
cmp/dec:  4.713793

data length: 1000
decorate: 0.018267
cmp:      0.083722
cmp/dec:  4.583226

data length: 5000
decorate: 0.132274
cmp:      0.539986
cmp/dec:  4.082318

data length: 10000
decorate: 0.350746
cmp:      1.232022
cmp/dec:  3.512577

data length: 50000
decorate: 1.971823
cmp:      7.555448
cmp/dec:  3.831707

data length: 75000
decorate: 3.181551
cmp:      11.692751
cmp/dec:  3.675173

data length: 100000
decorate: 4.401343
cmp:      16.481489
cmp/dec:  3.744650

data length: 150000
decorate: 7.256650
cmp:      25.021690
cmp/dec:  3.448105

>>> ## working on region in file c:/WINDOWS/TEMP/python--722007SCK...
data length: 10
decorate: 0.000184
cmp:      0.000225
cmp/dec:  1.218182

data length: 100
decorate: 0.001300
cmp:      0.005864
cmp/dec:  4.511283

data length: 200
decorate: 0.002601
cmp:      0.011922
cmp/dec:  4.584273

data length: 300
decorate: 0.004932
cmp:      0.019423
cmp/dec:  3.937978

data length: 400
decorate: 0.008885
cmp:      0.027430
cmp/dec:  3.087059

data length: 500
decorate: 0.008477
cmp:      0.036760
cmp/dec:  4.336662

data length: 1000
decorate: 0.020899
cmp:      0.082618
cmp/dec:  3.953240

data length: 5000
decorate: 0.151913
cmp:      0.563729
cmp/dec:  3.710857

data length: 10000
decorate: 0.304589
cmp:      1.485802
cmp/dec:  4.878048

data length: 50000
decorate: 1.758838
cmp:      7.936150
cmp/dec:  4.512156

data length: 75000
decorate: 3.160845
cmp:      12.241860
cmp/dec:  3.872971

data length: 100000
decorate: 4.147288
cmp:      15.488422
cmp/dec:  3.734590

data length: 150000
decorate: 6.621114
cmp:      24.302384
cmp/dec:  3.670437

>>> ## working on region in file c:/WINDOWS/TEMP/python--722007fMQ...
data length: 10
decorate: 0.000186
cmp:      0.000357
cmp/dec:  1.918919

data length: 100
decorate: 0.001405
cmp:      0.004687
cmp/dec:  3.337112

data length: 200
decorate: 0.002718
cmp:      0.011958
cmp/dec:  4.399630

data length: 300
decorate: 0.005749
cmp:      0.019170
cmp/dec:  3.334743

data length: 400
decorate: 0.007392
cmp:      0.028275
cmp/dec:  3.825057

data length: 500
decorate: 0.008398
cmp:      0.038744
cmp/dec:  4.613573

data length: 1000
decorate: 0.020952
cmp:      0.083020
cmp/dec:  3.962478

data length: 5000
decorate: 0.152054
cmp:      0.588490
cmp/dec:  3.870263

data length: 10000
decorate: 0.559918
cmp:      1.350087
cmp/dec:  2.411223

data length: 50000
decorate: 2.027378
cmp:      7.347429
cmp/dec:  3.624104

data length: 75000
decorate: 3.062071
cmp:      11.395732
cmp/dec:  3.721576

data length: 100000
decorate: 4.131656
cmp:      15.764242
cmp/dec:  3.815478

data length: 150000
decorate: 7.114213
cmp:      24.034858
cmp/dec:  3.378428

data length: 200000
decorate: 9.644709
cmp:      33.191710
cmp/dec:  3.441442

data length: 250000
decorate: 17.852266
cmp:      50.232261
cmp/dec:  2.813775

data length: 300000
decorate: 20.336841
cmp:      75.334111
cmp/dec:  3.704317

>>> 

-- 
Patricia J. Hawkins
Hawkins Internet Applications, LLC





More information about the Python-list mailing list