[Numpy-discussion] NumPy-Discussion Digest, Vol 42, Issue 85

Sun Mar 28 20:12:57 EDT 2010

> Date: Sun, 28 Mar 2010 00:24:01 +0000
> From: Andrea Gavana <andrea.gavana at gmail.com>
> Subject: [Numpy-discussion] Interpolation question
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>        <d5ff27201003271724o6c82ec75v225d819c84140b46 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi All,
>
>    I have an interpolation problem and I am having some difficulties
> in tackling it. I hope I can explain myself clearly enough.
>
> Basically, I have a whole bunch of 3D fluid flow simulations (close to
> 1000), and they are a result of different combinations of parameters.
> I was planning to use the Radial Basis Functions in scipy, but for the
> moment let's assume, to simplify things, that I am dealing only with
> one parameter (x). In 1000 simulations, this parameter x has 1000
> values, obviously. The problem is, the outcome of every single
> simulation is a vector of oil production over time (let's say 40
> values per simulation, one per year), and I would like to be able to
> interpolate my x parameter (1000 values) against all the simulations
> (1000x40) and get an approximating function that, given another x
> parameter (of size 1x1) will give me back an interpolated production
> profile (of size 1x40).

Andrea, may I suggest a different approach to RBF's.

Realize that your vector of 40 values for each row in y are not
independent of each other (they will be correlated).  First perform a
principal component analysis on this 1000 x 40 matrix and reduce it
down to a 1000 x A matrix, called your scores matrix, where A is the
number of independent components. A is selected so that it adequately
summarizes Y without over-fitting and you will find A << 40, maybe 2
or 3. There are tools, such as cross-validation, that do this well
enough.

Then you can relate your single column of X to these independent
column in A using a tool such as least squares: one least squares
model per column in the scores matrix.  This works because each column
in the score vector is independent (contains totally orthogonal
information) to the others.  But I would be surprised if this works
well enough, unless A = 1.

But it sounds like your don't just have a single column in you
X-variables (you hinted that the single column was just for
simplification).  In that case, I would build a projection to latent
structures model (PLS) model that builds a single latent-variable
model that simultaneously models the X-matrix, the Y-matrix as well as
providing the maximal covariance between these two matrices.

> Something along these lines:
>
> import numpy as np
> from scipy.interpolate import Rbf
>
> # x.shape = (1000, 1)
> # y.shape = (1000, 40)
>
> rbf = Rbf(x, y)
>
> # New result with xi.shape = (1, 1) --> fi.shape = (1, 40)
> fi = rbf(xi)
>
>
> Does anyone have a suggestion on how I could implement this? Sorry if
> it sounds confused... Please feel free to correct any wrong
> assumptions I have made, or to propose other approaches if you think
> RBFs are not suitable for this kind of problems.
>
> Thank you in advance for your suggestions.
>
> Andrea.
>
> "Imagination Is The Only Weapon In The War Against Reality."
> http://xoomer.alice.it/infinity77/
>
> ==> Never *EVER* use RemovalGroup for your house removal. You'll
> regret it forever.
> http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==