2010/3/30 Andrea Gavana <andrea.gavana@gmail.com>:
On 29 March 2010 23:44, Friedrich Romstedt wrote:
When you have nice results using 40 Rbfs for each time instant, this procedure means that the values for one time instant will not be influenced by adjacent-year data. I.e., you would probably get the same result using a norm extraordinary blowing up the time coordinate. To make it clear in code, when the time is your first coordinate, and you have three other coordinates, the *norm* would be:
def norm(x1, x2): return numpy.sqrt((((x1 - x2) * [1e3, 1, 1]) ** 2).sum())
In this case, the epsilon should be fixed, to avoid the influence of the changing distances on the epsilon determination inside of Rbf, which would spoil the whole thing.
Of course, it are here two and not three "other variables."
I have an idea how to tune your model: Take, say, the half or three thirds of your simulation data as interpolation database, and try to reproduce the remaining part. I have some ideas how to tune using this in practice.
Here, of course it are three quarters and not three thirds :-)
This is a very good idea indeed: I am actually running out of test cases (it takes a while to run a simulation, and I need to do it every time I try a new combination of parameters to check if the interpolation is good enough or rubbish). I'll give it a go tomorrow at work and I'll report back (even if I get very bad results :-D ).
I refined the idea a bit. Select one simulation, and use the complete rest as the interpolation base. Then repreat this for each simualation. Calculate some joint value for all the results, the simplest would maybe be, to calculate: def joint_ln_density(simulation_results, interpolation_results): return -((interpolation_results - simulation_results) ** 2) / (simulation_results ** 2) In fact, this calculates the logarithm of the Gaussians centered at *simulation_results* and taken at the "obervations" *interpolation_results*. It is the logarithms of the product of this Gaussians. The standard deviation of the Gaussians is assumed to be the value of the *simulation_results*, which means, that I assume that low-valued outcomes are much more precise in absolute numbers than high-outcome values, but /relative/ to their nominal value they are all the same precise. (NB: A scaling of the stddevs wouldn't make a significant difference /for you/. Same the neglected coefficients of the Gaussians.) I don't know, which method you like the most. Robert's and Kevin's proposals are hard to compete with ... You could optimise (maximise) the joint_ln_density outcome as a function of *epsilon* and the different scalings. afaic, scipy comes with some optimisation algorithms included. I checked it: http://docs.scipy.org/doc/scipy-0.7.x/reference/optimize.html#general-purpos... . Friedrich