Re: [Numpy-discussion] Interpolation question

30 Mar 2010

      2010/3/30 Andrea Gavana <andrea.gavana@gmail.com>:
...
On 29 March 2010 23:44, Friedrich Romstedt wrote:
...
When you have nice results using 40 Rbfs for each time instant, this
procedure means that the values for one time instant will not be
influenced by adjacent-year data.  I.e., you would probably get the
same result using a norm extraordinary blowing up the time coordinate.
 To make it clear in code, when the time is your first coordinate, and
you have three other coordinates, the *norm* would be:
def norm(x1, x2):
   return numpy.sqrt((((x1 - x2) * [1e3, 1, 1]) ** 2).sum())
In this case, the epsilon should be fixed, to avoid the influence of
the changing distances on the epsilon determination inside of Rbf,
which would spoil the whole thing.
Of course, it are here two and not three "other variables."
...
...
I have an idea how to tune your model:  Take, say, the half or three
thirds of your simulation data as interpolation database, and try to
reproduce the remaining part.  I have some ideas how to tune using
this in practice.
Here, of course it are three quarters and not three thirds :-)
...
This is a very good idea indeed: I am actually running out of test
cases (it takes a while to run a simulation, and I need to do it every
time I try a new combination of parameters to check if the
interpolation is good enough or rubbish). I'll give it a go tomorrow
at work and I'll report back (even if I get very bad results :-D ).
I refined the idea a bit.  Select one simulation, and use the complete
rest as the interpolation base.  Then repreat this for each
simualation.  Calculate some joint value for all the results, the
simplest would maybe be, to calculate:

def joint_ln_density(simulation_results, interpolation_results):
	return -((interpolation_results - simulation_results) ** 2) /
(simulation_results ** 2)

In fact, this calculates the logarithm of the Gaussians centered at
*simulation_results* and taken at the "obervations"
*interpolation_results*.  It is the logarithms of the product of this
Gaussians.  The standard deviation of the Gaussians is assumed to be
the value of the *simulation_results*, which means, that I assume that
low-valued outcomes are much more precise in absolute numbers than
high-outcome values, but /relative/ to their nominal value they are
all the same precise.  (NB: A scaling of the stddevs wouldn't make a
significant difference /for you/.  Same the neglected coefficients of
the Gaussians.)

I don't know, which method you like the most.  Robert's and Kevin's
proposals are hard to compete with ...

You could optimise (maximise) the joint_ln_density outcome as a
function of *epsilon* and the different scalings.  afaic, scipy comes
with some optimisation algorithms included.  I checked it:
http://docs.scipy.org/doc/scipy-0.7.x/reference/optimize.html#general-purpos...
.

Friedrich