[Numpy-discussion] Quick Question about Optimization

Mon May 19 15:31:07 EDT 2008

>        for n in range(0,time_milliseconds):
>            self.u  =  self.expfac_m  *  self.prev_u +
> (1-self.expfac_m) * self.aff_input[n,:]
>            self.v = self.u + self.sigma *
> np.random.standard_normal(size=(1,self.naff))
>            self.theta = self.expfac_theta * self.prev_theta -
> (1-self.expfac_theta)
>
>            idx_spk = np.where(self.v>=self.theta)
>
>            self.S[n,idx_spk] = 1
>            self.theta[idx_spk] = self.theta[idx_spk] + self.b
>
>            self.prev_u = self.u
>            self.prev_theta = self.theta

Copying elements into array objects that already exist will always be
faster than creating a new object with separate data.  However, in
this case, you don't need to do any copying or creation if you use a
flip-flopping index to handle keeping track of the previous. If I drop
the selfs, you can translate the above into (untested):

curidx = 0  # prev will be 2-curidx

u = empty( (2, naff) )
v = empty( naff )
theta = empty( (2, naff) )

stdnormrvs = np.random.standard_normal(size=(time_milliseconds,naff) )

for n in xrange(time_milliseconds):
    u[curidx, :] = expfac_m  *  u[2-curidx, :] + (1-expfac_m) * aff_input[n,:]
    v[:] = u[curidx, :] + sigma * stdnormrvs[n, :]
    theta[curidx, :] = expfac_theta * theta[2-curidx] - (1-expfac_theta)

    idx_spk = np.where(v >= theta)
    S[n,idx_spk] = 1
    theta[curidx, idx_spk] += b

    # Flop to handle previous stuff
    curidx = 2 - curidx

This should give you a substantial speedup.  Also, I have to say that
this is begging for weave.blitz, which compiles such statements using
templated c++ code to avoid temporaries.  It doesn't work on all
systems, but if it does in your case, here's what your code might look
like:

import scipy.weave as wv

curidx = 0

u = empty( (2, naff) )
v = empty( (1, naff) )
theta = empty( (2, naff) )

stdnormrvs = np.random.standard_normal(size=(time_milliseconds,naff) )

for n in xrange(time_milliseconds):
    wv.blitz("u[curidx, :] = expfac_m  *  u[2-curidx, :] +
(1-expfac_m) * aff_input[n,:]")
    wv.blitz("v[:] = u[curidx, :] + sigma * stdnormrvs[n, :]")
    wv.blitz("theta[curidx, :] = expfac_theta * theta[2-curidx] -
(1-expfac_theta)")

    idx_spk = np.where(v >= theta)
    S[n,idx_spk] = 1
    theta[curidx, idx_spk] += b

    # Flop to handle previous stuff
    curidx = 2 - curidx

-- 
+++++++++++++++++++++++++++++++++++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/
hoytak at gmail.com
+++++++++++++++++++++++++++++++++++

On Mon, May 19, 2008 at 11:08 AM, James Snyder <jbsnyder at gmail.com> wrote:
> Hi -
>
> First off, I know that optimization is evil, and I should make sure
> that everything works as expected prior to bothering with squeezing
> out extra performance, but the situation is that this particular block
> of code works, but it is about half as fast with numpy as in matlab,
> and I'm wondering if there's a better approach than what I'm doing.
>
> I have a chunk of code, below, that generally iterates over 2000
> iterations, and the vectors that are being worked on at a given step
> generally have ~14000 elements in them.
>
> In matlab, doing pretty much exactly the same thing takes about 6-7
> seconds, always around 13-14 with numpy on the same machine.  I've
> gotten this on Linux & Mac OS X.
>
> self.aff_input has the bulk of the data in it (2000x14000 array), and
> the various steps are for computing the state of some afferent neurons
> (if there's any interest, here's a paper that includes the model:
> Brandman, R. and Nelson ME (2002) A simple model of long-term spike
> train regularization. Neural Computation 14, 1575-1597.)
>
> I've imported numpy as np.
>
> Is there anything in practice here that could be done to speed this
> up?  I'm looking more for general numpy usage tips, that I can use
> while writing further code and not so things that would be obscure or
> difficult to maintain in the future.
>
> Also, the results of this are a binary array, I'm wondering if there's
> anything more compact for expressing than using 8 bits to represent
> each single bit.  I've poked around, but I haven't come up with any
> clean and unhackish ideas :-)
>
> Thanks!
>
> I can provide the rest of the code if needed, but it's basically just
> filling some vectors with random and empty data and initializing a few
> things.
>
>        for n in range(0,time_milliseconds):
>            self.u  =  self.expfac_m  *  self.prev_u +
> (1-self.expfac_m) * self.aff_input[n,:]
>            self.v = self.u + self.sigma *
> np.random.standard_normal(size=(1,self.naff))
>            self.theta = self.expfac_theta * self.prev_theta -
> (1-self.expfac_theta)
>
>            idx_spk = np.where(self.v>=self.theta)
>
>            self.S[n,idx_spk] = 1
>            self.theta[idx_spk] = self.theta[idx_spk] + self.b
>
>            self.prev_u = self.u
>            self.prev_theta = self.theta
>
> --
> James Snyder
> Biomedical Engineering
> Northwestern University
> jbsnyder at gmail.com
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>

-- 
+++++++++++++++++++++++++++++++++++
Hoyt Koepke
UBC Department of Computer Science
http://www.cs.ubc.ca/~hoytak/
hoytak at gmail.com
+++++++++++++++++++++++++++++++++++