Need a good idea: calculate the mean of many vectors

Hi, here is something I am thinking about for some time and I am wondering whether there is a better solution within numpy. The task is: I have an array (300000+ entries) with arrays each with length == 3, that is initially empty like this: n = 100 # for test otherwise ~300000 a1 = reshape(zeros(3*n).astype(float), (n,3)) (Speaking literally this is a field of displacements in a Finite-Element-Mesh) Now I have a lot of triangles where the corners are the nodes, each with an index between 0 and n-1 and I like to add a unique displacement for all three nodes to a1 like this a2 = zeros(n).astype(int) for indices, data in [...]: #data = array((1.,2.,3.)) #indices = (1,5,60) for index in indices: a1[index] += data a2[index] += 1 Now after filling a1 and a2 over and over (for a lot of triangles) I can finally calculate the averaged displacement on all points by this meand = a1/reshape(a2,(n,1)) I am doing this in the old numeric package (and can do it in numpy). I wonder whether there is a better way to do that in numpy. Numpy is in this actual case 20times slower than Numeric and I know that is due to the fact that I need to change only 3 values of the big array every loop. The basic problem is: How can I add very few values to a big array with a good performance? I thought about this: m = zeros(n) put(m, indices,1.) # only 3 ones in a long list of zeros! a1 += outer(m, data) a2 += m which is in fact very slow due to the function outer. Any help appriciated Thomas

On Tue, Feb 8, 2011 at 09:24, EMMEL Thomas <Thomas.EMMEL@3ds.com> wrote:
Hi,
here is something I am thinking about for some time and I am wondering whether there is a better solution within numpy.
The task is: I have an array (300000+ entries) with arrays each with length == 3, that is initially empty like this:
n = 100 # for test otherwise ~300000 a1 = reshape(zeros(3*n).astype(float), (n,3))
(Speaking literally this is a field of displacements in a Finite-Element-Mesh) Now I have a lot of triangles where the corners are the nodes, each with an index between 0 and n-1 and I like to add a unique displacement for all three nodes to a1 like this
a2 = zeros(n).astype(int)
for indices, data in [...]: #data = array((1.,2.,3.)) #indices = (1,5,60) for index in indices: a1[index] += data a2[index] += 1
Now after filling a1 and a2 over and over (for a lot of triangles) I can finally calculate the averaged displacement on all points by this
meand = a1/reshape(a2,(n,1))
Use np.bincount() on each coordinate. [~] |10> xyz = np.random.random_sample([10, 3]) [~] |11> ind = np.array([1,3,4,3,4,5,0,0,1,2]) [~] |12> sum_xyz = np.column_stack([np.bincount(ind, xyz[:,i]) for i in range(xyz.shape[1])]) [~] |13> sum_xyz array([[ 1.56632447, 0.88193741, 0.20253585], [ 1.36661663, 0.98698521, 0.79892009], [ 0.26787528, 0.12850502, 0.76042557], [ 1.08489219, 0.70099349, 1.56665748], [ 0.56662843, 1.3502819 , 0.15531993], [ 0.34900915, 0.34282216, 0.48250042]]) [~] |14> counts = np.bincount(ind) [~] |15> counts array([2, 2, 1, 2, 2, 1]) [~] |17> mean_xyz = sum_xyz / counts[:,np.newaxis] [~] |18> mean_xyz array([[ 0.78316224, 0.4409687 , 0.10126793], [ 0.68330832, 0.4934926 , 0.39946005], [ 0.26787528, 0.12850502, 0.76042557], [ 0.54244609, 0.35049674, 0.78332874], [ 0.28331422, 0.67514095, 0.07765996], [ 0.34900915, 0.34282216, 0.48250042]]) Watch out for divisions by 0 when you calculate the mean. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Tue, 08 Feb 2011 15:24:10 +0000, EMMEL Thomas wrote: [clip]
n = 100 # for test otherwise ~300000 a1 = reshape(zeros(3*n).astype(float), (n,3))
a2 = zeros(n).astype(int)
for indices, data in [...]: #data = array((1.,2.,3.)) #indices = (1,5,60) for index in indices: a1[index] += data a2[index] += 1
You can (mis-)use `bincount` to vectorize summations. Bincount does not support broadcasting and takes only 1-D inputs, so some manual shape manipulation is necessary. Something probably could/should be done to optimize Numpy's performance for small arrays. ---- import numpy as np n = 100 # 100 nodes m = 1000 # 1000 triangles # synthetic data tri_nodes = np.random.randint(0, n, size=(m, 3)) tri_data = np.random.rand(m, 3) # vectorize sums via bincount a1 = np.zeros((n,3), float) a2 = np.bincount(tri_nodes.ravel()) for j in range(3): # repeat(..., 3) -> the same (3,) data vector is added to each # node in the triangle a1[:,j] = np.bincount(tri_nodes.ravel(), np.repeat(tri_data[:,j], 3))
participants (3)
-
EMMEL Thomas
-
Pauli Virtanen
-
Robert Kern