[Numpy-discussion] sampling based on running sums
John Hunter
jdh2358 at gmail.com
Fri Jun 27 16:06:24 EDT 2008
I would like to find the sample points where the running sum of some
vector exceeds some threshold -- at those points I want to collect all
the data in the vector since the last time the criteria was reached
and compute some stats on it. For example, in python
tot = 0.
xs = []
ys = []
samples1 = []
for thisx, thisy in zip(x, y):
tot += thisx
xs.append(thisx)
ys.append(thisy)
if tot>=threshold:
samples1.append(func(xs,ys))
tot = 0.
xs = []
ys = []
The following is close in numpy
sx = np.cumsum(x)
n = (sx/threshold).astype(int)
ind = np.nonzero(np.diff(n)>0)[0]+1
lasti = 0
samples2 = []
for i in ind:
xs = x[lasti:i+1]
ys = y[lasti:i+1]
samples2.append(func(xs, ys))
lasti = i
But the sample points in ind do no guarantee that at least threshold
points are between the sample points due to truncation error.
What is a good numpy way to do this?
Thanks,
JDH
More information about the NumPy-Discussion
mailing list