Hpw make lists that are easy to sort.

Anton Vredegoor anton.vredegoor at gmail.com
Wed Mar 28 19:12:20 CEST 2007


Python's sorting algorithm takes advantage of preexisting order in a 
sequence:

#sort_test.py
import random
import time

def test():
     n = 1000
     k = 2**28

     L = random.sample(xrange(-k,k),n)
     R = random.sample(xrange(-k,k),n)

     t = time.time()
     LR = [(i+j) for i in L for j in R]
     print time.time()-t
     LR.sort()
     print time.time()-t

     print

     t = time.time()
     #L.sort()
     R.sort()
     presorted_LR = [(i+j) for i in L for j in R]
     print time.time()-t
     presorted_LR.sort()
     print time.time()-t

if __name__=='__main__':
     test()

On this -very slow- computer this prints:

 >d:\python25\pythonw -u "sort_test.py"
1.10000014305
8.96000003815

1.10000014305
5.49000000954
 >Exit code: 0

Presorting the second sequence gains us more than three seconds. I 
wonder if there is a way to generate the combined items in such a way 
that sorting them is even faster? Is there some other sorting algorithm 
that can specifically take advantage of this way -or another way- of 
generating this list?

The final sequence is len(L)*len(R) long but it is produced from only 
len(L)+len(R) different items, is it possible to exploit this fact? I'd 
also be interested in a more general solution that would work for 
summing the items of more than two lists in this way.

A.



More information about the Python-list mailing list