[Pythonmac-SIG] Sets and Speed

nickg nickgsuperstar at gmail.com
Mon May 30 17:35:34 CEST 2005


Hello,

x = x.union(set([i]) is the problem.  You are creating  two new data  
structures (set[i], and x.union) at every iteration, and I suspect  
union is a slow operation.

Just change this to "x.add(i)" and the script will run 20x faster  
than your list version!


$ ./settest2.py
done in 0.026

x = set()
for i in range(20000):
     x.add(i)

enjoy!

--nickg


On May 30, 2005, at 10:44 AM, Yair Benita wrote:

> Hi All,
> My research involves genomic research and the use of sets (recently
> introduced in version 2.4) makes my life easier in a lot of ways.  
> However, I
> noticed that working with large set slows the script to unbearable  
> speed.
>
> Below I compare two simple scripts, one makes use of a list and the  
> other
> makes use of a set. The difference of 20 seconds may not be much,  
> but let me
> tell you that this difference grows exponentially. When my sets  
> reach more
> that 100,000 elements, a union command is painfully slow. True, a  
> union
> command of a set may be much more complicated than using a list but  
> the time
> difference is simply too big for me.
>
> Any thoughts/suggestions?
>
> I suppose that the obvious solution is to work with lists most of  
> the time
> and use the sets only when I really need them. But then, what's the  
> point of
> having those set?
>
>
> Consider the following example just for making my point:
> (sets were run using MacPython 2.4.1 on a PowerMac G4 dual 1.25)
> ############################
> from time import time
>
> x = set([])
> prev_time = time()
> initial_time = time()
> for i in range(20000):
>     x = x.union(set([i]))
>     if i % 1000 == 0:
>         print i, "done. seconds since last: %.3f" % (time()-prev_time)
>         prev_time = time()
> print "done in %.3f" %(time() - initial_time)
>
> ############################
> The output of this script is:
> 0 done. seconds since last: 0.002
> 1000 done. seconds since last: 0.103
> 2000 done. seconds since last: 0.354
> 3000 done. seconds since last: 0.784
> 4000 done. seconds since last: 0.861
> 5000 done. seconds since last: 1.501
> 6000 done. seconds since last: 1.707
> 7000 done. seconds since last: 1.751
> 8000 done. seconds since last: 1.894
> 9000 done. seconds since last: 2.907
> 10000 done. seconds since last: 3.155
> 11000 done. seconds since last: 3.278
> 12000 done. seconds since last: 3.374
> 13000 done. seconds since last: 3.559
> 14000 done. seconds since last: 3.824
> 15000 done. seconds since last: 3.905
> 16000 done. seconds since last: 4.040
> 17000 done. seconds since last: 5.853
> 18000 done. seconds since last: 6.928
> 19000 done. seconds since last: 7.038
> done in 64.414
>
>
> Lets run the same script using a list:
> ############################
> from time import time
>
> x = []
> prev_time = time()
> initial_time = time()
> for i in range(20000):
>     if i not in x:
>         x.append(i)
>
>     if i % 1000 == 0:
>         print i, "done. seconds since last: %.3f" % (time()-prev_time)
>         prev_time = time()
>
> print "done in %.3f" %(time() - initial_time)
> ############################
>
> The output of this script is:
> 0 done. seconds since last: 0.002
> 1000 done. seconds since last: 0.102
> 2000 done. seconds since last: 0.323
> 3000 done. seconds since last: 0.506
> 4000 done. seconds since last: 0.676
> 5000 done. seconds since last: 0.871
> 6000 done. seconds since last: 1.051
> 7000 done. seconds since last: 1.242
> 8000 done. seconds since last: 1.365
> 9000 done. seconds since last: 1.635
> 10000 done. seconds since last: 1.760
> 11000 done. seconds since last: 1.929
> 12000 done. seconds since last: 2.133
> 13000 done. seconds since last: 2.294
> 14000 done. seconds since last: 2.659
> 15000 done. seconds since last: 3.015
> 16000 done. seconds since last: 3.182
> 17000 done. seconds since last: 3.332
> 18000 done. seconds since last: 3.552
> 19000 done. seconds since last: 3.690
> done in 39.197
>
>
>
> _______________________________________________
> Pythonmac-SIG maillist  -  Pythonmac-SIG at python.org
> http://mail.python.org/mailman/listinfo/pythonmac-sig
>



More information about the Pythonmac-SIG mailing list