[Pythonmac-SIG] Sets and Speed

Yair Benita y.benita at wanadoo.nl
Mon May 30 16:44:14 CEST 2005


Hi All,
My research involves genomic research and the use of sets (recently
introduced in version 2.4) makes my life easier in a lot of ways. However, I
noticed that working with large set slows the script to unbearable speed.

Below I compare two simple scripts, one makes use of a list and the other
makes use of a set. The difference of 20 seconds may not be much, but let me
tell you that this difference grows exponentially. When my sets reach more
that 100,000 elements, a union command is painfully slow. True, a union
command of a set may be much more complicated than using a list but the time
difference is simply too big for me.

Any thoughts/suggestions?

I suppose that the obvious solution is to work with lists most of the time
and use the sets only when I really need them. But then, what's the point of
having those set?


Consider the following example just for making my point:
(sets were run using MacPython 2.4.1 on a PowerMac G4 dual 1.25)
############################
from time import time

x = set([])
prev_time = time()
initial_time = time()
for i in range(20000):
    x = x.union(set([i]))
    if i % 1000 == 0:
        print i, "done. seconds since last: %.3f" % (time()-prev_time)
        prev_time = time()
print "done in %.3f" %(time() - initial_time)

############################
The output of this script is:
0 done. seconds since last: 0.002
1000 done. seconds since last: 0.103
2000 done. seconds since last: 0.354
3000 done. seconds since last: 0.784
4000 done. seconds since last: 0.861
5000 done. seconds since last: 1.501
6000 done. seconds since last: 1.707
7000 done. seconds since last: 1.751
8000 done. seconds since last: 1.894
9000 done. seconds since last: 2.907
10000 done. seconds since last: 3.155
11000 done. seconds since last: 3.278
12000 done. seconds since last: 3.374
13000 done. seconds since last: 3.559
14000 done. seconds since last: 3.824
15000 done. seconds since last: 3.905
16000 done. seconds since last: 4.040
17000 done. seconds since last: 5.853
18000 done. seconds since last: 6.928
19000 done. seconds since last: 7.038
done in 64.414


Lets run the same script using a list:
############################
from time import time

x = []
prev_time = time()
initial_time = time()
for i in range(20000):
    if i not in x:
        x.append(i)
        
    if i % 1000 == 0:
        print i, "done. seconds since last: %.3f" % (time()-prev_time)
        prev_time = time()
        
print "done in %.3f" %(time() - initial_time)
############################

The output of this script is:
0 done. seconds since last: 0.002
1000 done. seconds since last: 0.102
2000 done. seconds since last: 0.323
3000 done. seconds since last: 0.506
4000 done. seconds since last: 0.676
5000 done. seconds since last: 0.871
6000 done. seconds since last: 1.051
7000 done. seconds since last: 1.242
8000 done. seconds since last: 1.365
9000 done. seconds since last: 1.635
10000 done. seconds since last: 1.760
11000 done. seconds since last: 1.929
12000 done. seconds since last: 2.133
13000 done. seconds since last: 2.294
14000 done. seconds since last: 2.659
15000 done. seconds since last: 3.015
16000 done. seconds since last: 3.182
17000 done. seconds since last: 3.332
18000 done. seconds since last: 3.552
19000 done. seconds since last: 3.690
done in 39.197





More information about the Pythonmac-SIG mailing list