[Pythonmac-SIG] Sets and Speed
nickg
nickgsuperstar at gmail.com
Mon May 30 17:35:34 CEST 2005
Hello,
x = x.union(set([i]) is the problem. You are creating two new data
structures (set[i], and x.union) at every iteration, and I suspect
union is a slow operation.
Just change this to "x.add(i)" and the script will run 20x faster
than your list version!
$ ./settest2.py
done in 0.026
x = set()
for i in range(20000):
x.add(i)
enjoy!
--nickg
On May 30, 2005, at 10:44 AM, Yair Benita wrote:
> Hi All,
> My research involves genomic research and the use of sets (recently
> introduced in version 2.4) makes my life easier in a lot of ways.
> However, I
> noticed that working with large set slows the script to unbearable
> speed.
>
> Below I compare two simple scripts, one makes use of a list and the
> other
> makes use of a set. The difference of 20 seconds may not be much,
> but let me
> tell you that this difference grows exponentially. When my sets
> reach more
> that 100,000 elements, a union command is painfully slow. True, a
> union
> command of a set may be much more complicated than using a list but
> the time
> difference is simply too big for me.
>
> Any thoughts/suggestions?
>
> I suppose that the obvious solution is to work with lists most of
> the time
> and use the sets only when I really need them. But then, what's the
> point of
> having those set?
>
>
> Consider the following example just for making my point:
> (sets were run using MacPython 2.4.1 on a PowerMac G4 dual 1.25)
> ############################
> from time import time
>
> x = set([])
> prev_time = time()
> initial_time = time()
> for i in range(20000):
> x = x.union(set([i]))
> if i % 1000 == 0:
> print i, "done. seconds since last: %.3f" % (time()-prev_time)
> prev_time = time()
> print "done in %.3f" %(time() - initial_time)
>
> ############################
> The output of this script is:
> 0 done. seconds since last: 0.002
> 1000 done. seconds since last: 0.103
> 2000 done. seconds since last: 0.354
> 3000 done. seconds since last: 0.784
> 4000 done. seconds since last: 0.861
> 5000 done. seconds since last: 1.501
> 6000 done. seconds since last: 1.707
> 7000 done. seconds since last: 1.751
> 8000 done. seconds since last: 1.894
> 9000 done. seconds since last: 2.907
> 10000 done. seconds since last: 3.155
> 11000 done. seconds since last: 3.278
> 12000 done. seconds since last: 3.374
> 13000 done. seconds since last: 3.559
> 14000 done. seconds since last: 3.824
> 15000 done. seconds since last: 3.905
> 16000 done. seconds since last: 4.040
> 17000 done. seconds since last: 5.853
> 18000 done. seconds since last: 6.928
> 19000 done. seconds since last: 7.038
> done in 64.414
>
>
> Lets run the same script using a list:
> ############################
> from time import time
>
> x = []
> prev_time = time()
> initial_time = time()
> for i in range(20000):
> if i not in x:
> x.append(i)
>
> if i % 1000 == 0:
> print i, "done. seconds since last: %.3f" % (time()-prev_time)
> prev_time = time()
>
> print "done in %.3f" %(time() - initial_time)
> ############################
>
> The output of this script is:
> 0 done. seconds since last: 0.002
> 1000 done. seconds since last: 0.102
> 2000 done. seconds since last: 0.323
> 3000 done. seconds since last: 0.506
> 4000 done. seconds since last: 0.676
> 5000 done. seconds since last: 0.871
> 6000 done. seconds since last: 1.051
> 7000 done. seconds since last: 1.242
> 8000 done. seconds since last: 1.365
> 9000 done. seconds since last: 1.635
> 10000 done. seconds since last: 1.760
> 11000 done. seconds since last: 1.929
> 12000 done. seconds since last: 2.133
> 13000 done. seconds since last: 2.294
> 14000 done. seconds since last: 2.659
> 15000 done. seconds since last: 3.015
> 16000 done. seconds since last: 3.182
> 17000 done. seconds since last: 3.332
> 18000 done. seconds since last: 3.552
> 19000 done. seconds since last: 3.690
> done in 39.197
>
>
>
> _______________________________________________
> Pythonmac-SIG maillist - Pythonmac-SIG at python.org
> http://mail.python.org/mailman/listinfo/pythonmac-sig
>
More information about the Pythonmac-SIG
mailing list