# [Numpy-discussion] effectively computing variograms with numpy

Hanno Klemm klemm at phys.ethz.ch
Fri Jun 22 11:18:12 EDT 2007

Tim,

this is the best I could come up with until now:

import numpy as N

def naive_variogram(data, binsize=100., stepsize=5.):
"""calculates variograms along the rows and columns of the given
array which is supposed to contain equally spaced data with
stepsize stepsize"""

# how many elements do fit in one bin?

binlength = int(binsize/stepsize)

#bins in x- and y- direction (+1 for the possible
#elements larger than int(binsize/stepsize):
x_bins = (data.shape[1])/binlength+1
y_bins = (data.shape[0])/binlength+1

#arrays to store the reuslts in
x_result = N.zeros(x_bins, dtype = float)
y_result = N.zeros(y_bins, dtype = float)

#arrays to get teh denominators right
x_denominators = N.zeros(x_bins, dtype=float)
y_denominators = N.zeros(x_bins, dtype=float)

#what is the last index?
xlast = data.shape[1]
ylast = data.shape[0]
for i in range(data.shape[0]):
datasquared = (data - data[:,i])**2
#number of bins to fill until the end of the array:
numbins = 1 + (xlast - i)/binlength
for j in range(numbins):
x_result[j]+=\
datasquared[:,i+1+j*binlength:i+1+(j+1)*binlength].sum()
x_denominators[j] +=\
datasquared[:,i+1+j*binlength:i+1+(j+1)*binlength].size
try:
#Is there a rest?
x_result[numbins] +=
datasquared[:,i+1+numbins*binlength:].sum()
x_denominators[numbins] +=
datasquared[:,i+1+numbins*binlength:].size
except IndexError:
pass

x_result /= x_denominators

for i in range(data.shape[1]):
datasquared = (data - data[i])**2
#number of bins to fill until the end of the array:
numbins = 1 + (ylast - i)/binlength
#Fill the bins
for j in range(numbins):

y_result[j]+=datasquared[i+1+j*binlength:i+1+(j+1)*binlength].sum()
y_denominators[j] +=
datasquared[i+1+j*binlength:i+1+(j+1)*binlength].size
try:
#Is there a rest?
y_result[numbins] +=
datasquared[:,i+1+numbins*binlength:].sum()
y_denominators[numbins] +=
datasquared[:,i+1+numbins*binlength:].size
except IndexError:
pass

y_result /= y_denominators

return x_result, y_result

Thanks,
Hanno

Timothy Hochberg <tim.hochberg at ieee.org> said:

> ------=_Part_157389_1558912.1182523880067
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> On 6/22/07, Hanno Klemm <klemm at phys.ethz.ch> wrote:
> >
> >
> > Hi,
> >
> > I have an array which represents regularly spaced spatial data. I now
> > would like to compute the (semi-)variogram, i.e.
> >
> > gamma(h) = 1/N(h) \sum_{i,j\in N(h)} (z_i - z_j)**2,
> >
> > where h is the (approximate) spatial difference between the
> > measurements z_i, and z_j, and N(h) is the number of measurements with
> > distance h.
> >
> > However, I only want to calculate the thing along the rows and
> > columns. The naive approach involves two for loops and a lot of
> > searching, which becomes painfully slow on large data sets. Are there
> > better implementations around in numpy/scipy or does anyone have a
> > good idea of how to do that more efficient? I looked around a bit but
> > couldn't find anything.
>
>
> Can you send the naive code as well. Its often easier to see what's
going on
> with code in addition to the equations.
>
> Regards.
>
> -tim
>
>
>
> --
> .  __
> .   |-\
> .
> .  tim.hochberg at ieee.org
>
> ------=_Part_157389_1558912.1182523880067
> Content-Type: text/html; charset=ISO-8859-1
> Content-Transfer-Encoding: 7bit
> Content-Disposition: inline
>
> <br><br><div><span class="gmail_quote">On 6/22/07, <b
class="gmail_sendername">Hanno Klemm</b> <<a
href="mailto:klemm at phys.ethz.ch">klemm at phys.ethz.ch</a>>
wrote:</span><blockquote class="gmail_quote" style="border-left: 1px
solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> <br>Hi,<br><br>I have an array which represents regularly spaced
spatial data. I now<br>would like to compute the (semi-)variogram,
i.e.<br><br>gamma(h) = 1/N(h) \sum_{i,j\in N(h)} (z_i -
z_j)**2,<br><br>where h is the (approximate) spatial difference
between the
> <br>measurements z_i, and z_j, and N(h) is the number of
measurements with<br>distance h.<br><br>However, I only want to
calculate the thing along the rows and<br>columns. The naive approach
involves two for loops and a lot of
> <br>searching, which becomes painfully slow on large data sets. Are
there<br>better implementations around in numpy/scipy or does anyone
have a<br>good idea of how to do that more efficient? I looked around
a bit but<br>couldn't find anything.
> </blockquote><div><br>Can you send the naive code as well. Its often
easier to see what's going on with code in addition to the
equations.<br><br>Regards.<br><br>-tim<br><br></div></div><br
clear="all"><br>-- <br>.  __
> <br>.   |-\<br>.<br>.  <a
href="mailto:tim.hochberg at ieee.org">tim.hochberg at ieee.org</a>
>
> ------=_Part_157389_1558912.1182523880067--
>

--
Hanno Klemm
klemm at phys.ethz.ch