Hi all,
Here's the problem I want to write vectorised code for. I start with an
array of indices, say I=array([0,1,0,2,0,1,4]), and I want to come up
with an array C that counts how many times each index has been seen so
far if you were counting through the array from the beginning to the
end, e.g. for that I, C=array([0,0,1,0,2,1,0]). This is easy enough to
do with a C loop, but trickier to do just with numpy. Any ideas?
I came up with a solution, but it's not particularly elegant. I bet it
can be done better. But here's my solution FWIW:
First, sort I (we can always undo this at the end so no loss of
generality there). Now we have I=[0,0,0,1,1,2,4]. Next we construct the
array J=I[1:]!=I[:-1], i.e. marking the points where the index changes,
and the array K=-I. Write A, B for the cumsum of J and K (but with an
extra 0 on the beginning of each), so:
A=[0,0,0,1,1,2,3]
B=[0,1,2,2,3,3,3]
Now A is a sort of index of the indices, and B is sort of close to the C
we're looking for - it counts upwards at the right times but doesn't
reset to 0 when the index changes. So we fix this by subtracting the
value at the point where it should reset. The array of these values that
need to be subtracted is B[J], although this misses the first one, so we
write BJ for the array B[J] with 0 prepended. Now A indexes the indices,
so BJ[A] is an array of the same length as the initial array with value
the initial value for each segment:
BJ=[0,2,3,3]
BJ[A]=[0,0,0,2,2,3,3]
Finally, C is then B-BJ[A]:
C=[0,1,2,0,1,0,0]
Complete code:
J = I[1:]!=I[:-1]
K = I[1:]==I[:-1]
A = hstack((0, cumsum(array(J, dtype=int))))
B = hstack((0, cumsum(array(K, dtype=int))))
BJ = hstack((0, B[J]))
C = B-BJ[A]
OK, so that's my version - can you improve on it? I don't mind so much
if the code is unreadable as long as it's fast! :-)
Dan