[Numpy-discussion] Funky vectorisation question

Wed Apr 29 17:06:39 EDT 2009

Hi all,

Here's the problem I want to write vectorised code for. I start with an 
array of indices, say I=array([0,1,0,2,0,1,4]), and I want to come up 
with an array C that counts how many times each index has been seen so 
far if you were counting through the array from the beginning to the 
end, e.g. for that I, C=array([0,0,1,0,2,1,0]). This is easy enough to 
do with a C loop, but trickier to do just with numpy. Any ideas?

I came up with a solution, but it's not particularly elegant. I bet it 
can be done better. But here's my solution FWIW:

First, sort I (we can always undo this at the end so no loss of 
generality there). Now we have I=[0,0,0,1,1,2,4]. Next we construct the 
array J=I[1:]!=I[:-1], i.e. marking the points where the index changes, 
and the array K=-I. Write A, B for the cumsum of J and K (but with an 
extra 0 on the beginning of each), so:

A=[0,0,0,1,1,2,3]
B=[0,1,2,2,3,3,3]

Now A is a sort of index of the indices, and B is sort of close to the C 
we're looking for - it counts upwards at the right times but doesn't 
reset to 0 when the index changes. So we fix this by subtracting the 
value at the point where it should reset. The array of these values that 
need to be subtracted is B[J], although this misses the first one, so we 
write BJ for the array B[J] with 0 prepended. Now A indexes the indices, 
so BJ[A] is an array of the same length as the initial array with value 
the initial value for each segment:

BJ=[0,2,3,3]
BJ[A]=[0,0,0,2,2,3,3]

Finally, C is then B-BJ[A]:

C=[0,1,2,0,1,0,0]

Complete code:

J = I[1:]!=I[:-1]
K = I[1:]==I[:-1]
A = hstack((0, cumsum(array(J, dtype=int))))
B = hstack((0, cumsum(array(K, dtype=int))))
BJ = hstack((0, B[J]))
C = B-BJ[A]

OK, so that's my version - can you improve on it? I don't mind so much 
if the code is unreadable as long as it's fast! :-)

Dan