# How can I make this piece of code even faster?

Roy Smith roy at panix.com
Sat Jul 20 23:25:52 CEST 2013

```In article <6bf4d298-b425-4357-9c1a-192e6e6cd9f0 at googlegroups.com>,
pablobarhamalzas at gmail.com wrote:

> Ok, I'm working on a predator/prey simulation, which evolve using genetic
> algorithms. At the moment, they use a quite simple feed-forward neural
> network, which can change size over time. Each brain "tick" is performed by
> the following function (inside the Brain class):
>
>     def tick(self):
>         input_num = self.input_num
>         hidden_num = self.hidden_num
>         output_num = self.output_num
>
>         hidden = [0]*hidden_num
>         output = [0]*output_num
>
>         inputs = self.input
>         h_weight = self.h_weight
>         o_weight = self.o_weight
>
>         e = math.e
>
>         count = -1
>         for x in range(hidden_num):
>             temp = 0
>             for y in range(input_num):
>                 count += 1
>                 temp += inputs[y] * h_weight[count]
>             hidden[x] = 1/(1+e**(-temp))
>
>         count = -1
>         for x in range(output_num):
>             temp = 0
>             for y in range(hidden_num):
>                 count += 1
>                 temp += hidden[y] * o_weight[count]
>             output[x] = 1/(1+e**(-temp))
>
>         self.output = output
>
> The function is actually quite fast (~0.040 seconds per 200 calls, using 10
> input, 20 hidden and 3 output neurons), and used to be much slower untill I
> fiddled about with it a bit to make it faster. However, it is still somewhat
> slow for what I need it.
>
> My question to you is if you an see any obvious (or not so obvious) way of
> making this faster. I've heard about numpy and have been reading about it,
> but I really can't see how it could be implemented here.

First thing, I would add some instrumentation to see where the most time
is being spent.  My guess is in the first set of nested loops, where the
inner loop gets executed hidden_num * input_num (i.e. 10 * 20 = 200)
times.  But timing data is better than my guess.

Assuming I'm right, though, you do compute range(input_num) 20 times.
You don't need to do that.  You might try xrange(), or you might just
factor out creating the list outside the outer loop.  But, none of that
seems like it should make much difference.

What possible values can temp take?  If it can only take certain
discrete values and you can enumerate them beforehand, you might want to
build a dict mapping temp -> 1/(1+e**(-temp)) and then all that math
becomes just a table lookup.

```

More information about the Python-list mailing list