[Numpy-discussion] vectorizing
Keith Goodman
kwgoodman at gmail.com
Fri Jun 5 15:52:25 EDT 2009
On Fri, Jun 5, 2009 at 11:07 AM, Brian Blais <bblais at bryant.edu> wrote:
> Hello,
> I have a vectorizing problem that I don't see an obvious way to solve. What
> I have is a vector like:
> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
> and a matrix
> T=zeros((6,6))
> and what I want in T is a count of all of the transitions in obs, e.g.
> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the
> sequence 3-4 only happens once, etc... I can do it unvectorized like:
> for o1,o2 in zip(obs[:-1],obs[1:]):
> T[o1,o2]+=1
>
> which gives the correct answer from above, which is:
> array([[ 0., 0., 0., 0., 0., 0.],
> [ 0., 0., 3., 0., 0., 1.],
> [ 0., 3., 0., 1., 0., 0.],
> [ 0., 0., 2., 0., 1., 0.],
> [ 0., 0., 0., 2., 0., 0.],
> [ 0., 0., 0., 0., 1., 0.]])
>
>
> but I thought there would be a better way. I tried:
> o1=obs[:-1]
> o2=obs[1:]
> T[o1,o2]+=1
> but this doesn't give a count, it just yields 1's at the transition points,
> like:
> array([[ 0., 0., 0., 0., 0., 0.],
> [ 0., 0., 1., 0., 0., 1.],
> [ 0., 1., 0., 1., 0., 0.],
> [ 0., 0., 1., 0., 1., 0.],
> [ 0., 0., 0., 1., 0., 0.],
> [ 0., 0., 0., 0., 1., 0.]])
>
> Is there a clever way to do this? I could write a quick Cython solution,
> but I wanted to keep this as an all-numpy implementation if I can.
It's a little faster (8.5% for me when obs is length 10000) if you do
T = np.zeros((6,6), dtype=np.int)
But it more than 5 times faster if you use lists for T and obs. You're
just storing information here, so there is no reason to pay for the
overhead of arrays.
import random
import numpy as np
T = [[0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0], [0,0,0,0,0,0],
[0,0,0,0,0,0], [0,0,0,0,0,0]]
obs = [random.randint(0, 5) for z in range(10000)]
def test(obs, T):
for o1,o2 in zip(obs[:-1],obs[1:]):
T[o1][o2] += 1
return T
More information about the NumPy-Discussion
mailing list