[Numpy-discussion] Efficient way of binning points and applying functions to these groups

Eric Emsellem eric.emsellem at eso.org
Wed Dec 26 04:09:42 EST 2012


I am looking for an efficient way of doing some simple binning of points 
and then applying some functions to points within each bin.

I have tried several ways, including crude looping over the indices, or 
using digitize (see below) but I cannot manage to get it as efficient as 
I need it to be. I have a comparison with a similar (although complex) 
code in idl, and thought I would ask the forum. In idl there is a way to 
"invert" an histogram and get a reverse set of indices (via the 
histogram function) which seems to do the trick (or maybe it is faster 
for another reason).

Below I provide a (dummy) example of what I wish to achieve. Any hint on 
how to do this EFFICIENTLY using numpy is most welcome. I need to speed 
things up quite a bit (at the moment, the version I have, see below, is 
10 times slower than the more complex idl routine..., I must be doing 
something wrong!).

# I have a random set of data points in 2D with coordinates x,y :
import numpy as np
x = np.random.random(1000)
y = np.random.random(1000)

# I have now a 2D grid given by let's say 10x10 grid points:
nx = 11
ny = 21
lx = linspace(0,1,nx)
ly = linspace(0,1,ny)
gx, gy = np.meshgrid(lx, ly)

# So my set of 2D bins are (not needed in the solution I present but 
just for clarity)
bins = np.dstack((gx.ravel(), gy.ravel()))[0]

# Now I want to have the list of points in each bin and
# if the number of points in that bin is larger than 10, apply (dummy) 
function func1 (see below)
# If less than 10, apply (dummy) function func2 so (dum?)
# if 0, do nothing
# for two dummy functions like (for example):
def func1(x) : return x.mean()

def func2(x) : return x.std()

# One solution would be to use digitize in 1D and histogram in 2D (don't 
need gx, gy for this one):

h = histogram2d(x, y, bins=[lx, ly])[0]

digitX = np.digitize(x, lx)
digitY = np.digitize(y, ly)

# create the output array, with -999 values to make sure I see which 
ones are not filled in
result = np.zeros_like(h) - 999

for i in range(nx-1) :
     for j in range(ny-1) :
         selectionofpoints = (digitX == i+1) & (digitY == j+1)
         if h[i,j] > 10 : result[i,j] = func1(x[selectionofpoints])
         elif h[i,j] > 0 : result[i,j] = func2(x[selectionofpoints])

More information about the NumPy-Discussion mailing list