[SciPy-User] How to average different pieces or an array?

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Aug 10 10:59:08 EDT 2009


On Sat, Aug 8, 2009 at 5:48 AM, Emmanuelle
Gouillart<emmanuelle.gouillart at normalesup.org> wrote:
> [..]
>
>> this might blow up because of the size of the intermediate arrays if
>> the number of bins is large (len of array by number of bins ?)
>
>> I think, Gilles answer might be the fastest, if the bins are given by
>> the indices.
>
>> If the bins are given as labels
>> e.g. [0,0,1,1,2,2,2,2,2,2]
>> then np.bincount or scipy.ndimage can be used to calculate the means,
>> which are much faster for a large number of bins and large arrays.
>
> Sure, the only advantage of my solution is that it satisfies the
> constraint "without any for loop" :D ... which is often a very silly
> constraint, but sometimes it's just fun looking for a solution without
> loops!
>
> Emmanuelle
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>

Here's a version without python loop, and with shape of intermediate
arrays same as x. In some primitive timing, this version is 2 to 50
times faster than the python loop of Boris (using xrange).

Josef


import numpy as np

x = np.array([1,3,2,6,7,4,5,4,9,4])
y = np.array([0,2,4,10])

#construct label index
ind2 = np.zeros(x.shape, int)
ind2[y[1:-1]] = 1   # assumes boundary indices are included in y
ind = ind2.cumsum()

means = np.bincount(ind,x)/np.bincount(ind)
meanarr = means[ind]
print meanarr



More information about the SciPy-User mailing list