[Numpy-discussion] Interpolation question

Tue Mar 30 16:48:25 EDT 2010

2010/3/30 Andrea Gavana <andrea.gavana at gmail.com>:
> On 29 March 2010 23:44, Friedrich Romstedt wrote:
>> When you have nice results using 40 Rbfs for each time instant, this
>> procedure means that the values for one time instant will not be
>> influenced by adjacent-year data.  I.e., you would probably get the
>> same result using a norm extraordinary blowing up the time coordinate.
>>  To make it clear in code, when the time is your first coordinate, and
>> you have three other coordinates, the *norm* would be:
>>
>> def norm(x1, x2):
>>    return numpy.sqrt((((x1 - x2) * [1e3, 1, 1]) ** 2).sum())
>>
>> In this case, the epsilon should be fixed, to avoid the influence of
>> the changing distances on the epsilon determination inside of Rbf,
>> which would spoil the whole thing.

Of course, it are here two and not three "other variables."

>> I have an idea how to tune your model:  Take, say, the half or three
>> thirds of your simulation data as interpolation database, and try to
>> reproduce the remaining part.  I have some ideas how to tune using
>> this in practice.

Here, of course it are three quarters and not three thirds :-)

> This is a very good idea indeed: I am actually running out of test
> cases (it takes a while to run a simulation, and I need to do it every
> time I try a new combination of parameters to check if the
> interpolation is good enough or rubbish). I'll give it a go tomorrow
> at work and I'll report back (even if I get very bad results :-D ).

I refined the idea a bit.  Select one simulation, and use the complete
rest as the interpolation base.  Then repreat this for each
simualation.  Calculate some joint value for all the results, the
simplest would maybe be, to calculate:

def joint_ln_density(simulation_results, interpolation_results):
	return -((interpolation_results - simulation_results) ** 2) /
(simulation_results ** 2)

In fact, this calculates the logarithm of the Gaussians centered at
*simulation_results* and taken at the "obervations"
*interpolation_results*.  It is the logarithms of the product of this
Gaussians.  The standard deviation of the Gaussians is assumed to be
the value of the *simulation_results*, which means, that I assume that
low-valued outcomes are much more precise in absolute numbers than
high-outcome values, but /relative/ to their nominal value they are
all the same precise.  (NB: A scaling of the stddevs wouldn't make a
significant difference /for you/.  Same the neglected coefficients of
the Gaussians.)

I don't know, which method you like the most.  Robert's and Kevin's
proposals are hard to compete with ...

You could optimise (maximise) the joint_ln_density outcome as a
function of *epsilon* and the different scalings.  afaic, scipy comes
with some optimisation algorithms included.  I checked it:
http://docs.scipy.org/doc/scipy-0.7.x/reference/optimize.html#general-purpose
.

Friedrich