Execution speed question
Iain King
iainking at gmail.com
Fri Jul 25 16:07:34 CEST 2008
On Jul 25, 1:46 pm, Iain King <iaink... at gmail.com> wrote:
> On Jul 25, 10:57 am, Suresh Pillai <stochash... at yahoo.ca> wrote:
>
>
>
> > I am performing simulations on networks (graphs). I have a question on
> > speed of execution (assuming very ample memory for now). I simplify the
> > details of my simulation below, as the question I ask applies more
> > generally than my specific case. I would greatly appreciate general
> > feedback in terms of computing and of course considerations specific to
> > implementation in Python.
>
> > The nodes in my network may be ON or OFF. The network starts off with
> > all nodes in the OFF state. I loop through the nodes. For each node
> > that is OFF, I consider some probability of it turning ON based on the
> > states of its neighbours. I MUST GO THROUGH ALL NODES BEFORE DECIDING
> > WHICH ONES TO TURN ON.
>
> > So my question is whether it is faster to
>
> > 1. loop through a list of ALL nodes and check for OFF nodes using ifs
>
> > or to
>
> > 2. loop through a container of OFF nodes and remove from this when they
> > turn ON
>
> or 3. build a new list every iteration intead of deleting from the old
> one:
>
> while processing:
> new_off_list = []
> for x in off_list:
> if goes_on(x):
> on_list.append(x)
> else:
> new_off_list.append(x)
> off_list = new_off_list
> generation += 1
>
> Iain
I was curious to what extent the different methods varied in time, so
I checked it out. Here there are three procedures: test_every which
matches your (1), destructive which matches your (2), and constructive
which is (3) as I've outlined above.
On varying the size of the dataset I get this (probability a node goes
on = 50%):
Length of initial list: 100000
Test every: 1.16085492357
Destructive: 2.592310272
Constructive: 0.850312458886
Length of initial list: 200000
Test every: 2.48013843287
Destructive: 9.20894689718
Constructive: 1.73562198439
Length of initial list: 400000
Test every: 5.00652267447
Destructive: 44.9696004134
Constructive: 3.51687329373
Length of initial list: 800000
Test every: 9.67657648655
Destructive: 220.57583941
Constructive: 7.06614485537
and changing the probability that a nodes goes on (dataset size =
200000):
Probability goes on: 1/2
Test every: 2.24765364513
Destructive: 9.28801971614
Constructive: 1.62770773816
Probability goes on: 1/4
Test every: 4.77387350904
Destructive: 13.4432467571
Constructive: 3.45467140006
Probability goes on: 1/8
Test every: 11.0514899721
Destructive: 18.4026878278
Constructive: 6.86778036177
Probability goes on: 1/16
Test every: 22.5896021593
Destructive: 25.7784044083
Constructive: 13.8631404605
Probability goes on: 1/32
Test every: 49.7667941179
Destructive: 39.3652502735
Constructive: 27.2527219598
Probability goes on: 1/64
Test every: 91.0523955153
Destructive: 65.7747103963
Constructive: 54.4087322936
Code:
import random
from timeit import Timer
SIZE = 100000
MAX = 2
def goes_on(x):
global MAX
return random.randint(1,MAX) == 1
def test_every():
global SIZE
print "Test every:",
nodes = range(SIZE)
is_on = [False for x in xrange(SIZE)]
count = SIZE
while count:
for i,x in enumerate(nodes):
if not is_on[i] and goes_on(x):
is_on[i] = True
count -= 1
def destructive():
global SIZE
print "Destructive:",
off_list = range(SIZE)
on_list = []
count = SIZE
while count:
for i in xrange(len(off_list)-1, -1, -1):
x = off_list[i]
if goes_on(x):
on_list.append(x)
del(off_list[i])
count -= 1
def constructive():
global SIZE
print "Constructive:",
off_list = range(SIZE)
on_list = []
count = SIZE
while count:
new_off_list = []
for x in off_list:
if goes_on(x):
on_list.append(x)
count -= 1
else:
new_off_list.append(x)
off_list = new_off_list
#SIZE = 200000
while True:
print "Length of initial list:", SIZE
#print "Probability goes on: 1/%d" % MAX
print Timer("test_every()", "from __main__ import
test_every").timeit(1)
print Timer("destructive()", "from __main__ import
destructive").timeit(1)
print Timer("constructive()", "from __main__ import
constructive").timeit(1)
print
SIZE *= 2
#MAX *= 2
Conclusions:
On size, (2) really doesn't like bigger datasets, taking exponentially
longer as it increases, while (1) and (3) happily increase linearly.
(3) is faster.
On probability it's (1) who's the loser, while (2) and (3) are happy.
(3) is once again faster.
I think (2)'s poor performance is being amplified by how python
handles lists and list deletions; the effect may be stymied in other
languages, or by using other data constructs in python (like a
dictionary or a user made list class). If you were short on memory
then (2) would have an advantage, but as it is, (3) is the clear
winner.
I'm a fan of list comprehensions, and it feels like they could be nice
here, but since we are making two lists at once here I don't see how
to... anyone see how to use them (or 'map' if you want to be old
school)?
Iain
More information about the Python-list
mailing list