[Numpy-discussion] Interpolation question

Andrea Gavana andrea.gavana at gmail.com
Tue Mar 30 17:09:55 EDT 2010


Hi Friedrich & All,

On 30 March 2010 21:48, Friedrich Romstedt wrote:
> 2010/3/30 Andrea Gavana <andrea.gavana at gmail.com>:
>> On 29 March 2010 23:44, Friedrich Romstedt wrote:
>>> When you have nice results using 40 Rbfs for each time instant, this
>>> procedure means that the values for one time instant will not be
>>> influenced by adjacent-year data.  I.e., you would probably get the
>>> same result using a norm extraordinary blowing up the time coordinate.
>>>  To make it clear in code, when the time is your first coordinate, and
>>> you have three other coordinates, the *norm* would be:
>>>
>>> def norm(x1, x2):
>>>    return numpy.sqrt((((x1 - x2) * [1e3, 1, 1]) ** 2).sum())
>>>
>>> In this case, the epsilon should be fixed, to avoid the influence of
>>> the changing distances on the epsilon determination inside of Rbf,
>>> which would spoil the whole thing.
>
> Of course, it are here two and not three "other variables."
>
>>> I have an idea how to tune your model:  Take, say, the half or three
>>> thirds of your simulation data as interpolation database, and try to
>>> reproduce the remaining part.  I have some ideas how to tune using
>>> this in practice.
>
> Here, of course it are three quarters and not three thirds :-)
>
>> This is a very good idea indeed: I am actually running out of test
>> cases (it takes a while to run a simulation, and I need to do it every
>> time I try a new combination of parameters to check if the
>> interpolation is good enough or rubbish). I'll give it a go tomorrow
>> at work and I'll report back (even if I get very bad results :-D ).
>
> I refined the idea a bit.  Select one simulation, and use the complete
> rest as the interpolation base.  Then repreat this for each
> simualation.  Calculate some joint value for all the results, the
> simplest would maybe be, to calculate:
>
> def joint_ln_density(simulation_results, interpolation_results):
>        return -((interpolation_results - simulation_results) ** 2) /
> (simulation_results ** 2)
>
> In fact, this calculates the logarithm of the Gaussians centered at
> *simulation_results* and taken at the "obervations"
> *interpolation_results*.  It is the logarithms of the product of this
> Gaussians.  The standard deviation of the Gaussians is assumed to be
> the value of the *simulation_results*, which means, that I assume that
> low-valued outcomes are much more precise in absolute numbers than
> high-outcome values, but /relative/ to their nominal value they are
> all the same precise.  (NB: A scaling of the stddevs wouldn't make a
> significant difference /for you/.  Same the neglected coefficients of
> the Gaussians.)
>
> I don't know, which method you like the most.  Robert's and Kevin's
> proposals are hard to compete with ...
>
> You could optimise (maximise) the joint_ln_density outcome as a
> function of *epsilon* and the different scalings.  afaic, scipy comes
> with some optimisation algorithms included.  I checked it:
> http://docs.scipy.org/doc/scipy-0.7.x/reference/optimize.html#general-purpose

I had planned to show some of the results based on the suggestion you
gave me yesterday: I took two thirds ( :-D ) of the simulations
database to use them as interpolation base and tried to reproduce the
rest using the interpolation. Unfortunately it seems like my computer
at work has blown up (maybe a BSOD, I was doing waaaay too many heavy
things at once) and I can't access it from home at the moment. I can't
show the real field profiles, but at least I can show you how good or
bad the interpolation performs (in terms of relative errors), and I
was planning to post a matplotlib composite graph to do just that. I
am still hopeful my PC will resurrect at some point :-D

However, from the first 100 or so interpolated simulations, I could
gather these findings:

1) Interpolations on *cumulative* productions on oil and gas are
extremely good, with a maximum range of relative error of -3% / +2%:
most of them (95% more or less) show errors < 1%, but for few of them
I get the aforementioned range of errors in the interpolations;
2) Interpolations on oil and gas *rates* (time dependent), I usually
get a -5% / +3% error per timestep, which is very good for my
purposes. I still need to check these values but the first set of
results were very promising;
3) Interpolations on gas injection (cumulative and rate) are a bit
more shaky (15% error more or less), but this is essentially due to a
particular complex behaviour of the reservoir simulator when it needs
to decide (based on user input) if the gas is going to be re-injected,
sold, used as excess gas and a few more; I am not that worried about
this issue for the moment.

I'll be off for a week for Easter (I'll leave for Greece in few
hours), but I am really looking forward to try the suggestion Kevin
posted and to investigate Robert's idea: I know about kriging, but I
initially thought it wouldn't do a good job in this case. I'll
reconsider for sure. And I'll post a screenshot of the results as soon
as my PC get out of the Emergency Room.

Thank you again!

Andrea.

"Imagination Is The Only Weapon In The War Against Reality."
http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll
regret it forever.
http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==



More information about the NumPy-Discussion mailing list