[Numpy-discussion] vectorizing
Brent Pedersen
bpederse at gmail.com
Fri Jun 5 16:22:14 EDT 2009
On Fri, Jun 5, 2009 at 1:05 PM, Keith Goodman<kwgoodman at gmail.com> wrote:
> On Fri, Jun 5, 2009 at 1:01 PM, Keith Goodman <kwgoodman at gmail.com> wrote:
>> On Fri, Jun 5, 2009 at 12:53 PM, <josef.pktd at gmail.com> wrote:
>>> On Fri, Jun 5, 2009 at 2:07 PM, Brian Blais <bblais at bryant.edu> wrote:
>>>> Hello,
>>>> I have a vectorizing problem that I don't see an obvious way to solve. What
>>>> I have is a vector like:
>>>> obs=array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>>> and a matrix
>>>> T=zeros((6,6))
>>>> and what I want in T is a count of all of the transitions in obs, e.g.
>>>> T[1,2]=3 because the sequence 1-2 happens 3 times, T[3,4]=1 because the
>>>> sequence 3-4 only happens once, etc... I can do it unvectorized like:
>>>> for o1,o2 in zip(obs[:-1],obs[1:]):
>>>> T[o1,o2]+=1
>>>>
>>>> which gives the correct answer from above, which is:
>>>> array([[ 0., 0., 0., 0., 0., 0.],
>>>> [ 0., 0., 3., 0., 0., 1.],
>>>> [ 0., 3., 0., 1., 0., 0.],
>>>> [ 0., 0., 2., 0., 1., 0.],
>>>> [ 0., 0., 0., 2., 0., 0.],
>>>> [ 0., 0., 0., 0., 1., 0.]])
>>>>
>>>>
>>>> but I thought there would be a better way. I tried:
>>>> o1=obs[:-1]
>>>> o2=obs[1:]
>>>> T[o1,o2]+=1
>>>> but this doesn't give a count, it just yields 1's at the transition points,
>>>> like:
>>>> array([[ 0., 0., 0., 0., 0., 0.],
>>>> [ 0., 0., 1., 0., 0., 1.],
>>>> [ 0., 1., 0., 1., 0., 0.],
>>>> [ 0., 0., 1., 0., 1., 0.],
>>>> [ 0., 0., 0., 1., 0., 0.],
>>>> [ 0., 0., 0., 0., 1., 0.]])
>>>>
>>>> Is there a clever way to do this? I could write a quick Cython solution,
>>>> but I wanted to keep this as an all-numpy implementation if I can.
>>>>
>>>
>>> histogram2d or its imitation, there was a discussion on histogram2d a
>>> short time ago
>>>
>>>>>> obs=np.array([1,2,3,4,3,2,1,2,1,2,1,5,4,3,2])
>>>>>> obs2 = obs - 1
>>>>>> trans = np.hstack((0,np.bincount(obs2[:-1]*6+6+obs2[1:]),0)).reshape(6,6)
>>>>>> re = np.array([[ 0., 0., 0., 0., 0., 0.],
>>> ... [ 0., 0., 3., 0., 0., 1.],
>>> ... [ 0., 3., 0., 1., 0., 0.],
>>> ... [ 0., 0., 2., 0., 1., 0.],
>>> ... [ 0., 0., 0., 2., 0., 0.],
>>> ... [ 0., 0., 0., 0., 1., 0.]])
>>>>>> np.all(re == trans)
>>> True
>>>
>>>>>> trans
>>> array([[0, 0, 0, 0, 0, 0],
>>> [0, 0, 3, 0, 0, 1],
>>> [0, 3, 0, 1, 0, 0],
>>> [0, 0, 2, 0, 1, 0],
>>> [0, 0, 0, 2, 0, 0],
>>> [0, 0, 0, 0, 1, 0]])
>>>
>>>
>>> or
>>>
>>>>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
>>>>>> re
>>> array([[ 0., 0., 0., 0., 0., 0.],
>>> [ 0., 0., 3., 0., 0., 1.],
>>> [ 0., 3., 0., 1., 0., 0.],
>>> [ 0., 0., 2., 0., 1., 0.],
>>> [ 0., 0., 0., 2., 0., 0.],
>>> [ 0., 0., 0., 0., 1., 0.]])
>>>>>> h
>>> array([[ 0., 0., 0., 0., 0., 0.],
>>> [ 0., 0., 3., 0., 0., 1.],
>>> [ 0., 3., 0., 1., 0., 0.],
>>> [ 0., 0., 2., 0., 1., 0.],
>>> [ 0., 0., 0., 2., 0., 0.],
>>> [ 0., 0., 0., 0., 1., 0.]])
>>>
>>>>>> np.all(re == h)
>>> True
>>
>> There's no way my list method can beat that. But by adding
>>
>> import psyco
>> psyco.full()
>>
>> I get a total speed up of a factor of 15 when obs is length 10000.
>
> Actually, it is faster:
>
> histogram:
>
>>> h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
>>> timeit h, e1, e2 = np.histogram2d(obs[:-1], obs[1:], bins=6, range=[[0,5],[0,5]])
> 100 loops, best of 3: 4.14 ms per loop
>
> lists:
>
>>> timeit test(obs3, T3)
> 1000 loops, best of 3: 1.32 ms per loop
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
here's a go:
import numpy as np
import random
from itertools import groupby
def test1(obs, T):
for o1,o2 in zip(obs[:-1],obs[1:]):
T[o1][o2] += 1
return T
def test2(obs, T):
s = zip(obs[:-1], obs[1:])
for idx, g in groupby(sorted(s)):
T[idx] = len(list(g))
return T
obs = [random.randint(0, 5) for z in range(10000)]
print test2(obs, np.zeros((6, 6)))
print test1(obs, np.zeros((6, 6)))
##############
In [10]: timeit test1(obs, np.zeros((6, 6)))
100 loops, best of 3: 18.8 ms per loop
In [11]: timeit test2(obs, np.zeros((6, 6)))
100 loops, best of 3: 6.91 ms per loop
More information about the NumPy-Discussion
mailing list