[Numpy-discussion] Interpolation question

Kevin Dunn kgdunn at gmail.com
Sun Mar 28 20:38:52 EDT 2010


> Message: 5
> Date: Sun, 28 Mar 2010 00:24:01 +0000
> From: Andrea Gavana <andrea.gavana at gmail.com>
> Subject: [Numpy-discussion] Interpolation question
> To: Discussion of Numerical Python <numpy-discussion at scipy.org>
> Message-ID:
>        <d5ff27201003271724o6c82ec75v225d819c84140b46 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi All,
>
>    I have an interpolation problem and I am having some difficulties
> in tackling it. I hope I can explain myself clearly enough.
>
> Basically, I have a whole bunch of 3D fluid flow simulations (close to
> 1000), and they are a result of different combinations of parameters.
> I was planning to use the Radial Basis Functions in scipy, but for the
> moment let's assume, to simplify things, that I am dealing only with
> one parameter (x). In 1000 simulations, this parameter x has 1000
> values, obviously. The problem is, the outcome of every single
> simulation is a vector of oil production over time (let's say 40
> values per simulation, one per year), and I would like to be able to
> interpolate my x parameter (1000 values) against all the simulations
> (1000x40) and get an approximating function that, given another x
> parameter (of size 1x1) will give me back an interpolated production
> profile (of size 1x40).

[I posted the following earlier but forgot to change the subject - it
appears as a new thread called "NumPy-Discussion Digest, Vol 42, Issue
85" - please ignore that thread]

Andrea, may I suggest a different approach to RBF's.

Realize that your vector of 40 values for each row in y are not
independent of each other (they will be correlated).  First build a
principal component analysis (PCA) model on this 1000 x 40 matrix and
reduce it down to a 1000 x A matrix, called your scores matrix, where
A is the number of independent components. A is selected so that it
adequately summarizes Y without over-fitting and you will find A <<
40, maybe A = 2 or 3. There are tools, such as cross-validation, that
will help select a reasonable value of A.

Then you can relate your single column of X to these independent
columns in A using a tool such as least squares: one least squares
model per column in the scores matrix.  This works because each column
in the score vector is independent (contains totally orthogonal
information) to the others.  But I would be surprised if this works
well enough, unless A = 1.

But it sounds like your don't just have a single column in your
X-variables (you hinted that the single column was just for
simplification).  In that case, I would build a projection to latent
structures model (PLS) model that builds a single latent-variable
model that simultaneously models the X-matrix, the Y-matrix as well as
providing the maximal covariance between these two matrices.

If you need some references and an outline of code, then I can readily
provide these.

This is a standard problem with data from spectroscopic instruments
and with batch processes.  They produce hundreds, sometimes 1000's of
samples per row. PCA and PLS are very effective at summarizing these
down to a much smaller number of independent columns, very often just
a handful, and relating them (i.e. building a predictive model) to
other data matrices.

I also just saw the suggestions of others to center the data by
subtracting the mean from each column in Y and scaling (by dividing
through by the standard deviation).  This is a standard data
preprocessing step, called autoscaling and makes sense for any data
analysis, as you already discovered.

Hope that helps,
Kevin

> Something along these lines:
>
> import numpy as np
> from scipy.interpolate import Rbf
>
> # x.shape = (1000, 1)
> # y.shape = (1000, 40)
>
> rbf = Rbf(x, y)
>
> # New result with xi.shape = (1, 1) --> fi.shape = (1, 40)
> fi = rbf(xi)
>
>
> Does anyone have a suggestion on how I could implement this? Sorry if
> it sounds confused... Please feel free to correct any wrong
> assumptions I have made, or to propose other approaches if you think
> RBFs are not suitable for this kind of problems.
>
> Thank you in advance for your suggestions.
>
> Andrea.
>
> "Imagination Is The Only Weapon In The War Against Reality."
> http://xoomer.alice.it/infinity77/
>
> ==> Never *EVER* use RemovalGroup for your house removal. You'll
> regret it forever.
> http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==



More information about the NumPy-Discussion mailing list