Mailman 3 Interpolation question - NumPy-Discussion

Interpolation question

older
long integers in loadtxt -- bad...

Andrea Gavana

28 Mar 2010 28 Mar '10

12:24 a.m.

Hi All, I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough. Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40). Something along these lines: import numpy as np from scipy.interpolate import Rbf # x.shape = (1000, 1) # y.shape = (1000, 40) rbf = Rbf(x, y) # New result with xi.shape = (1, 1) --> fi.shape = (1, 40) fi = rbf(xi) Does anyone have a suggestion on how I could implement this? Sorry if it sounds confused... Please feel free to correct any wrong assumptions I have made, or to propose other approaches if you think RBFs are not suitable for this kind of problems. Thank you in advance for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Show replies by date

josef.pktd＠gmail.com

28 Mar 28 Mar

1:14 a.m.

On Sat, Mar 27, 2010 at 8:24 PM, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...

Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40).

Something along these lines:

import numpy as np from scipy.interpolate import Rbf

# x.shape = (1000, 1) # y.shape = (1000, 40)

rbf = Rbf(x, y)

# New result with xi.shape = (1, 1) --> fi.shape = (1, 40) fi = rbf(xi)

Does anyone have a suggestion on how I could implement this? Sorry if it sounds confused... Please feel free to correct any wrong assumptions I have made, or to propose other approaches if you think RBFs are not suitable for this kind of problems.

if I understand correctly then you have a function (oil production y) over time, observed at 40 time periods (t), and this function is parameterized by a single parameter x. y = f(t,x) t = np.arange(40) len(x) = 1000 blowing up t and x you would have a grid of 1000*40 with 40000 observations in t,x and y rbf = Rbf(t, x, y) might theoretically work, however I think the arrays are too large, the full distance matrix would be 40000*40000 I don't know if bivariate splines would be able to handle this size. How much noise do you have in the simulations? If there is not much noise, and you are mainly interested in interpolating for x, then I would interpolate pointwise for each of the 40 t y = f(x) for each t, which has 1000 observations. This might still be too much for RBF but some of the other univariate interpolators should be able to handle it. I would try this first, because it is the easiest to do. This could be extended to interpolating in a second stage for t if necessary. If there is a lot of noise, some interpolation local in t (and maybe x) might be able to handle it, and combine information across t. An alternative would be to model y as a function of t semi-parametrically, e.g. with polynomials, and interpolate the coefficients of the polynomial across x, or try some of the interpolators that are designed for images, which might have a better chance for the array size, but there I don't know anything. Just some thoughts, Josef

...

Thank you in advance for your suggestions.

Andrea.

"Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <== _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Anne Archibald

8:26 a.m.

On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...

Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40).

If I understand your problem correctly, you have a function taking one value as input (or one 3D vector) and returning a vector of length 40. You want to know whether there are tools in scipy to support this. I'll say first that it's not strictly necessary for there to be: you could always just build 40 different interpolators, one for each component of the output. After all, there's no interaction in the calculations between the output coordinates. This is of course awkward, in that you'd like to just call F(x) and get back a vector of length 40, but that can be remedied by writing a short wrapper function that simply calls all 40 interpolators. A problem that may be more serious is that the python loop over the 40 interpolators can be slow, while a C implementation would give vector-valued results rapidly. To make this work, you're going to have to find a compiled-code interpolator that returns vector values. This is not in principle complicated, since they just need to run the same interpolator code on 40 sets of coefficients. But I don't think many of scipy's interpolators do this. The only ones I'm sure are able to do this are the polynomial interpolators I wrote, which work only on univariate inputs (and provide no kind of smoothing). If you're going to use these I recommend using scipy's spline functions to construct smoothing splines, which you'd then convert to a piecewise cubic. But I'd say, give the 40 interpolators a try. If they're slow, try interpolating many points at once: rather than giving just one x value, call the interpolators with a thousand (or however many you need) at a time. This will speed up scipy's interpolators, and it will make the overhead of a forty-element loop negligible. Anne

...

Something along these lines:

import numpy as np from scipy.interpolate import Rbf

# x.shape = (1000, 1) # y.shape = (1000, 40)

rbf = Rbf(x, y)

# New result with xi.shape = (1, 1) --> fi.shape = (1, 40) fi = rbf(xi)

Does anyone have a suggestion on how I could implement this? Sorry if it sounds confused... Please feel free to correct any wrong assumptions I have made, or to propose other approaches if you think RBFs are not suitable for this kind of problems.

Thank you in advance for your suggestions.

Andrea.

"Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <== _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Andrea Gavana

10:33 a.m.

On 28 March 2010 08:26, Anne Archibald wrote:

...

On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40).

If I understand your problem correctly, you have a function taking one value as input (or one 3D vector) and returning a vector of length 40. You want to know whether there are tools in scipy to support this.

I'll say first that it's not strictly necessary for there to be: you could always just build 40 different interpolators, one for each component of the output. After all, there's no interaction in the calculations between the output coordinates. This is of course awkward, in that you'd like to just call F(x) and get back a vector of length 40, but that can be remedied by writing a short wrapper function that simply calls all 40 interpolators.

Thank you Anne and Josef, my explanation was very bad but your suggestions opened up my mind :-D . I believe I am going to give the 40 interpolators a try, although you mentioned that RBFs are going to have some problems if the vectors' sizes are too big... I planned for a multidimensional interpolation of about 10 parameters (each of these has 1000 elements), but at this point I am afraid it will not work. If any of you is aware of another methodology/library I could use (Fortran is also fine, as long as I can wrap it with f2py) for this problem please feel free to put me on the right track. Thank you again for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Robert Kern

6:22 p.m.

On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...

On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40).

If I understand your problem correctly, you have a function taking one value as input (or one 3D vector) and returning a vector of length 40. You want to know whether there are tools in scipy to support this.

I'll say first that it's not strictly necessary for there to be: you could always just build 40 different interpolators, one for each component of the output. After all, there's no interaction in the calculations between the output coordinates. This is of course awkward, in that you'd like to just call F(x) and get back a vector of length 40, but that can be remedied by writing a short wrapper function that simply calls all 40 interpolators.

A problem that may be more serious is that the python loop over the 40 interpolators can be slow, while a C implementation would give vector-valued results rapidly.

40 is not a bad number to loop over. The thing I would be worried about is the repeated calculation of the (1000, 1000) radial function evaluation. I think that a small modification of Rbf could be made to handle the vector-valued case. I leave that as an exercise to Andrea, of course. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Andrea Gavana

8:47 p.m.

HI All, On 28 March 2010 19:22, Robert Kern wrote:

...

On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...
On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40).

If I understand your problem correctly, you have a function taking one value as input (or one 3D vector) and returning a vector of length 40. You want to know whether there are tools in scipy to support this.

I'll say first that it's not strictly necessary for there to be: you could always just build 40 different interpolators, one for each component of the output. After all, there's no interaction in the calculations between the output coordinates. This is of course awkward, in that you'd like to just call F(x) and get back a vector of length 40, but that can be remedied by writing a short wrapper function that simply calls all 40 interpolators.

A problem that may be more serious is that the python loop over the 40 interpolators can be slow, while a C implementation would give vector-valued results rapidly.

40 is not a bad number to loop over. The thing I would be worried about is the repeated calculation of the (1000, 1000) radial function evaluation. I think that a small modification of Rbf could be made to handle the vector-valued case. I leave that as an exercise to Andrea, of course. :-)

It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these: # Interpolate the cumulative oil production rbf = Rbf(x1, x2, x3, x4, x5, x6, final_oil_recovery) # Try to use existing combination of parameters to get back # the original result (more or less) possible_recovery = rbf(x1[10], x2[10], x3[10], x4[10], x5[10], x6[10]) 1) If I attempt to use the resulting interpolation function ("rbf"), inputting a single value for each x1, x2, ..., x6 that is *already present* in the original x1, x2, ..., x6 vectors, I get meaningless results (i.e., negative oil productions, 1000% error, and so on) in all cases with some (rare) exceptions when using the "thin-plate" interpolation option; 2) Using a combination of parameters x1, x2, ..., x6 that is *not* in the original set, I get non-physical (i.e., unrealistic) results: it appears that RBFs think that if I increase the number of production wells I get less oil recovered (!), sometimes by a factor of 10. I am pretty sure I am missing something in this whole interpolation thing (I am no expert at all), but honestly it is the first time I get such counter-intuitive results in my entire Python numerical life ;-) Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Pierre GM

9:14 p.m.

On Mar 28, 2010, at 4:47 PM, Andrea Gavana wrote:

...

HI All,

On 28 March 2010 19:22, Robert Kern wrote:

...
On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...
On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters.

...

It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these:

Which is hardly surprising: you're working with a physical process, you must have some constraints on your parameters (whether dependence between parameters, bounds on the estimates...) that are not taken into account by the interpolation scheme you're using. So, it's back to the drawing board. What are you actually trying to achieve ? Find the best estimates of your 10 parameters to match an observed production timeline ? Find a space for your 10 parameters that gives some realistic production ? Assuming that your 10 parameters are actually independent, did you run 1000**10 simulations to test all the possible combinations? Probably not, so you could try using a coarser interval between min and max values for each parameters (say, 10 ?) and check the combos... Or you could try to decrease the number of parameters by finding the ones that have more influence on the final outcome and dropping the others. A different problem all together... My point is: don't be discouraged by the weird results you're getting: it's probably because you're not using the right approach yet.

Andrea Gavana

9:40 p.m.

Hi All, On 28 March 2010 22:14, Pierre GM wrote:

...

On Mar 28, 2010, at 4:47 PM, Andrea Gavana wrote:

...
HI All,

On 28 March 2010 19:22, Robert Kern wrote:

...
On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...
On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters.

...
It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these:

Which is hardly surprising: you're working with a physical process, you must have some constraints on your parameters (whether dependence between parameters, bounds on the estimates...) that are not taken into account by the interpolation scheme you're using. So, it's back to the drawing board.

The curious thing is, when using the rbf interpolated function to find a new approximation, I am not giving RBFs input values that are outside the bounds of the existing parameters. Either they are exactly the same as the input ones (for a single simulation), or they are slightly different but always inside the bounds. I always thought that, at least for the same input

...

What are you actually trying to achieve ? Find the best estimates of your 10 parameters to match an observed production timeline ? Find a space for your 10 parameters that gives some realistic production ? Assuming that your 10 parameters are actually independent, did you run 1000**10 simulations to test all the possible combinations? Probably not, so you could try using a coarser interval between min and max values for each parameters (say, 10 ?) and check the combos... Or you could try to decrease the number of parameters by finding the ones that have more influence on the final outcome and dropping the others. A different problem all together... My point is: don't be discouraged by the weird results you're getting: it's probably because you're not using the right approach yet. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

-- Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Andrea Gavana

9:53 p.m.

Hi All, On 28 March 2010 22:14, Pierre GM wrote:

...

On Mar 28, 2010, at 4:47 PM, Andrea Gavana wrote:

...
HI All,

On 28 March 2010 19:22, Robert Kern wrote:

...
On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...
On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters.

...
It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these:

Which is hardly surprising: you're working with a physical process, you must have some constraints on your parameters (whether dependence between parameters, bounds on the estimates...) that are not taken into account by the interpolation scheme you're using. So, it's back to the drawing board.

Sorry, this laptop has gone mad and sent the message before I was finished... The curious thing is, when using the rbf interpolated function to find a new approximation, I am not giving RBFs input values that are outside the bounds of the existing parameters. Either they are exactly the same as the input ones (for a single simulation), or they are slightly different but always inside the bounds. I always thought that, at least for the same input values, it should give (approximatively) the same answer as the real one.

...

What are you actually trying to achieve ? Find the best estimates of your 10 parameters to match an observed production timeline ? Find a space for your 10 parameters that gives some realistic production ?

I have this bunch of simulations for possible future oil productions (forecasts), run over the past 4 years, which have different combination of number of oil producing wells, gas injection wells, oil plateau values, gas injection plateau values and gas production plateau values (6 parameters in total, not 10). Now, the combinations themselves are pretty nicely spread, meaning that I seem to have a solid base to build an interpolator that will give me a reasonable answer when I ask it for parameters inside the bounds (i.e., no extrapolation involved).

...

Assuming that your 10 parameters are actually independent, did you run 1000**10 simulations to test all the possible combinations?

I hope you're kidding :-D . Each of these simulations take from 7 to 12 hours, not counting the huge amount of time needed to build their data files by hand... Let's see a couple of practical examples (I can share the data if someone is interested). Example 1 # o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts) op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15] fi = rbf(op, gp, gi, o2w, o3w, inw) # I => KNOW <= the answer to be close to +3.5e8 print fi [ -1.00663296e+08] (yeah right...) Example 2 Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells => less production) fi = rbf(op, gp, gi, o2w, o3w, inw) print fi [ 1.30023424e+08] And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8... Andrea.

Brennan Williams

10:36 p.m.

Andrea Gavana wrote:

...

Hi All,

On 28 March 2010 22:14, Pierre GM wrote:

...
On Mar 28, 2010, at 4:47 PM, Andrea Gavana wrote:

...
HI All,

On 28 March 2010 19:22, Robert Kern wrote:

...
On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...
On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters.

It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these:

Which is hardly surprising: you're working with a physical process, you must have some constraints on your parameters (whether dependence between parameters, bounds on the estimates...) that are not taken into account by the interpolation scheme you're using. So, it's back to the drawing board.

Sorry, this laptop has gone mad and sent the message before I was finished...

The curious thing is, when using the rbf interpolated function to find a new approximation, I am not giving RBFs input values that are outside the bounds of the existing parameters. Either they are exactly the same as the input ones (for a single simulation), or they are slightly different but always inside the bounds. I always thought that, at least for the same input values, it should give (approximatively) the same answer as the real one.

...
What are you actually trying to achieve ? Find the best estimates of your 10 parameters to match an observed production timeline ? Find a space for your 10 parameters that gives some realistic production ?

I have this bunch of simulations for possible future oil productions (forecasts), run over the past 4 years, which have different combination of number of oil producing wells, gas injection wells, oil plateau values, gas injection plateau values and gas production plateau values (6 parameters in total, not 10). Now, the combinations themselves are pretty nicely spread, meaning that I seem to have a solid base to build an interpolator that will give me a reasonable answer when I ask it for parameters inside the bounds (i.e., no extrapolation involved).

...
Assuming that your 10 parameters are actually independent, did you run 1000**10 simulations to test all the possible combinations?

I hope you're kidding :-D . Each of these simulations take from 7 to 12 hours, not counting the huge amount of time needed to build their data files by hand...

Let's see a couple of practical examples (I can share the data if someone is interested).

Definitely interested in helping solve this one so feel free to email the data (obviously not 1,000 Eclipse smspec and unsmry files although I can handle that without any problems).

...

Example 1

# o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15]

I take it all your inputs are float arrays? i.e. you just consider number of production wells as a floating point value. I suppose strictly speaking o2 and o3 are not independent of eachother but I would guess that for the purposes of using Rbf it wouldn't be an issue. I'm just having a look at the cookbook example (http://www.scipy.org/Cookbook/RadialBasisFunctions) and adding a couple more variables, bumping up the number of points to 1000 and setting the z values to be up around 1.0e+09 - so far it still seems to work although if you increase the sample points to 2000+ it falls over with a memory error. I've got a smaller bunch of Eclipse simulations I could try Rbf out on - 19 runs with 9 variables (it's a Tornado algorithm with -1,0,+1 values for each variable). FOPT's will be in a similar range ot yours. Brennan

...

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I => KNOW <= the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells => less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8...

Andrea. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Andrea Gavana

11:24 p.m.

Hi Brennan & All, On 28 March 2010 23:36, Brennan Williams wrote:

...

Andrea Gavana wrote:

...
Let's see a couple of practical examples (I can share the data if someone is interested).

Definitely interested in helping solve this one so feel free to email the data (obviously not 1,000 Eclipse smspec and unsmry files although I can handle that without any problems).

It's 1 TB of data and my script took almost 4 hours in parallel on 4 processor over a network to reduce that huge amount of data to a tiny 406x7 matrix... but anyway, I'll see if I have the permission of posting the data and I'll let you know. I could really use some suggestions... or maybe a good kick in my a** telling me I am doing a bunch of stupid things.

...

...
Example 1

# o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15]

I take it all your inputs are float arrays? i.e. you just consider number of production wells as a floating point value.

They are floats, but it doesn't really matter as RBFs should treat them all as floats. My results are the same whether I use "45" or "45.0".

...

I suppose strictly speaking o2 and o3 are not independent of eachother but I would guess that for the purposes of using Rbf it wouldn't be an issue.

They are independent: they are 2 completely different kind of wells, o2 will give you much more gas while o3 much more oil. It turns out that all interpolation methods *except* "linear" fail miserably with huge errors even when I use exactly the same input points as the original data points. If I use the "linear" interpolator, I get errors in the range of -1.16305401154e-005 1.29984131475e-006 In percentage, which is very good :-D. Unfortunately, when I input even a single parameter value that is not in the original set, I still get these non-physical results of oil production increasing when I decrease the number of wells... Doh! Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Friedrich Romstedt

10:51 p.m.

2010/3/28 Andrea Gavana <andrea.gavana@gmail.com>:

...

Example 1

# o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15]

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I => KNOW <= the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells => less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8...

I want to put my2 cents in, fwiw ... What I see from http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.... are three things: 1. Rbf uses some weighting based on the radial functions. 2. Rbf results go through the nodal points without *smooth* set to some value != 0 3. Rbf is isotropic (3.) is most important. I see from your e-mail that the values you pass in to Rbf are of very different order of magnitude. But the default norm used in Rbf is for sure isotropic, i.e., it will result in strange and useless "mean distances" in R^N where there are N parameters. You have to either pass in a *norm* which weights the coords according to their extent, or to scale the data such that the aspect ratios of the hypecube's edges are sensible. You can imagine it as the follwing ascii plot: * * * * * * * the x dimension is say the gas plateau dimension (~1e10), the y dimension is the year dimension (~1e1). In an isotropic plot, using the data lims and aspect = 1, they may be well homogenous, but on this scaling, as used by Rbf, they are lumped. I don't know if it's clear what I mean from my description? Are the corresponding parameter values spread completely randomly, or is there always varied only one axis? If random, you could try a fractal analysis to find out the optimal scaling of the axes for N-d coverage of N-d space. In the case above, is is something between points and lines, I guess. When scaling it properly, it becomes a plane-like coveage of the xy plane. When scaling further, it will become points again (lumped together). So you can find an optimum. There are quantitative methods to obtain the fractal dimension, and they are quite simple. Friedrich

Andrea Gavana

11:30 p.m.

Hi Friedrich & All, On 28 March 2010 23:51, Friedrich Romstedt wrote:

...

2010/3/28 Andrea Gavana <andrea.gavana@gmail.com>:

...
Example 1

# o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15]

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I => KNOW <= the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells => less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8...

I want to put my2 cents in, fwiw ...

What I see from http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.... are three things:

1. Rbf uses some weighting based on the radial functions. 2. Rbf results go through the nodal points without *smooth* set to some value != 0 3. Rbf is isotropic

(3.) is most important. I see from your e-mail that the values you pass in to Rbf are of very different order of magnitude. But the default norm used in Rbf is for sure isotropic, i.e., it will result in strange and useless "mean distances" in R^N where there are N parameters. You have to either pass in a *norm* which weights the coords according to their extent, or to scale the data such that the aspect ratios of the hypecube's edges are sensible.

You can imagine it as the follwing ascii plot:

* * * * * * *

the x dimension is say the gas plateau dimension (~1e10), the y dimension is the year dimension (~1e1). In an isotropic plot, using the data lims and aspect = 1, they may be well homogenous, but on this scaling, as used by Rbf, they are lumped. I don't know if it's clear what I mean from my description?

Are the corresponding parameter values spread completely randomly, or is there always varied only one axis?

If random, you could try a fractal analysis to find out the optimal scaling of the axes for N-d coverage of N-d space. In the case above, is is something between points and lines, I guess. When scaling it properly, it becomes a plane-like coveage of the xy plane. When scaling further, it will become points again (lumped together). So you can find an optimum. There are quantitative methods to obtain the fractal dimension, and they are quite simple.

I believe I need a technical dictionary to properly understand all that... :-D . Sorry, I am no expert at all, really, just an amateur with some imagination, but your suggestion about the different magnitude of the matrix is a very interesting one. Although I have absolutely no idea on how to re-scale them properly to avoid RBFs going crazy. As for your question, the parameter are not spread completely randomly, as this is a collection of simulations done over the years, trying manually different scenarios, without having in mind a proper experimental design or any other technique. Nor the parameter values vary only on one axis in each simulation (few of them are like that). Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Robert Kern

11:34 p.m.

On Sun, Mar 28, 2010 at 18:30, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...

Hi Friedrich & All,

On 28 March 2010 23:51, Friedrich Romstedt wrote:

...
2010/3/28 Andrea Gavana <andrea.gavana@gmail.com>:

...
Example 1

# o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15]

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I => KNOW <= the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells => less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8...

I want to put my2 cents in, fwiw ...

What I see from http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.... are three things:

1. Rbf uses some weighting based on the radial functions. 2. Rbf results go through the nodal points without *smooth* set to some value != 0 3. Rbf is isotropic

(3.) is most important. I see from your e-mail that the values you pass in to Rbf are of very different order of magnitude. But the default norm used in Rbf is for sure isotropic, i.e., it will result in strange and useless "mean distances" in R^N where there are N parameters. You have to either pass in a *norm* which weights the coords according to their extent, or to scale the data such that the aspect ratios of the hypecube's edges are sensible.

...

I believe I need a technical dictionary to properly understand all that... :-D . Sorry, I am no expert at all, really, just an amateur with some imagination, but your suggestion about the different magnitude of the matrix is a very interesting one. Although I have absolutely no idea on how to re-scale them properly to avoid RBFs going crazy.

Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Andrea Gavana

11:59 p.m.

On 29 March 2010 00:34, Robert Kern wrote:

...

On Sun, Mar 28, 2010 at 18:30, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi Friedrich & All,

On 28 March 2010 23:51, Friedrich Romstedt wrote:

...
2010/3/28 Andrea Gavana <andrea.gavana@gmail.com>:

...
Example 1

# o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15]

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I => KNOW <= the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells => less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8...

I want to put my2 cents in, fwiw ...

What I see from http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.... are three things:

1. Rbf uses some weighting based on the radial functions. 2. Rbf results go through the nodal points without *smooth* set to some value != 0 3. Rbf is isotropic

(3.) is most important. I see from your e-mail that the values you pass in to Rbf are of very different order of magnitude. But the default norm used in Rbf is for sure isotropic, i.e., it will result in strange and useless "mean distances" in R^N where there are N parameters. You have to either pass in a *norm* which weights the coords according to their extent, or to scale the data such that the aspect ratios of the hypecube's edges are sensible.

...
I believe I need a technical dictionary to properly understand all that... :-D . Sorry, I am no expert at all, really, just an amateur with some imagination, but your suggestion about the different magnitude of the matrix is a very interesting one. Although I have absolutely no idea on how to re-scale them properly to avoid RBFs going crazy.

Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try.

Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now... I get reasonable values for various combinations of parameters by scaling the input data using the standard deviation of each of them. It seems also that the other interpolation schemes are much less erratic now, and in fact (using input values equal to the original data) I get these range of errors for the various schemes: inverse multiquadric -15.6098482614 15.7194674906 linear -1.76157336073e-010 1.24949181055e-010 cubic -0.000709860285963 0.018385394661 gaussian -293.930336611 282.058111404 quintic -0.176381494531 5.37780806549 multiquadric -30.9515933446 58.3786105046 thin-plate -7.06755391536e-006 8.71407169821e-005 In percentage. Some of them are still off the mark, but you should have seen them before ;-) . I'll do some more analysis tomorrow, and if it works I am going to try the bigger profile-over-time interpolation. Thank you so much guys for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Brennan Williams

29 Mar 29 Mar

12:36 a.m.

Andrea Gavana wrote:

...

On 29 March 2010 00:34, Robert Kern wrote:

...
On Sun, Mar 28, 2010 at 18:30, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi Friedrich & All,

On 28 March 2010 23:51, Friedrich Romstedt wrote:

...
2010/3/28 Andrea Gavana <andrea.gavana@gmail.com>:

...
Example 1

# o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15]

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I => KNOW <= the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells => less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8...

I want to put my2 cents in, fwiw ...

What I see from http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.... are three things:

1. Rbf uses some weighting based on the radial functions. 2. Rbf results go through the nodal points without *smooth* set to some value != 0 3. Rbf is isotropic

(3.) is most important. I see from your e-mail that the values you pass in to Rbf are of very different order of magnitude. But the default norm used in Rbf is for sure isotropic, i.e., it will result in strange and useless "mean distances" in R^N where there are N parameters. You have to either pass in a *norm* which weights the coords according to their extent, or to scale the data such that the aspect ratios of the hypecube's edges are sensible.

I believe I need a technical dictionary to properly understand all that... :-D . Sorry, I am no expert at all, really, just an amateur with some imagination, but your suggestion about the different magnitude of the matrix is a very interesting one. Although I have absolutely no idea on how to re-scale them properly to avoid RBFs going crazy.

Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try.

Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now... I get reasonable values for various combinations of parameters by scaling the input data using the standard deviation of each of them. It seems also that the other interpolation schemes are much less erratic now, and in fact (using input values equal to the original data) I get these range of errors for the various schemes:

inverse multiquadric -15.6098482614 15.7194674906 linear -1.76157336073e-010 1.24949181055e-010 cubic -0.000709860285963 0.018385394661 gaussian -293.930336611 282.058111404 quintic -0.176381494531 5.37780806549 multiquadric -30.9515933446 58.3786105046 thin-plate -7.06755391536e-006 8.71407169821e-005

In percentage. Some of them are still off the mark, but you should have seen them before ;-) .

That's great news. Going to give it a try myself. Interesting that the linear scheme gives the lowest error range.

...

I'll do some more analysis tomorrow, and if it works I am going to try the bigger profile-over-time interpolation. Thank you so much guys for your suggestions.

Andrea.

"Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <== _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Andrea Gavana

9:20 p.m.

Hi All, On 29 March 2010 00:59, Andrea Gavana wrote:

...

On 29 March 2010 00:34, Robert Kern wrote:

...
Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try.

Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now... I get reasonable values for various combinations of parameters by scaling the input data using the standard deviation of each of them. It seems also that the other interpolation schemes are much less erratic now, and in fact (using input values equal to the original data) I get these range of errors for the various schemes:

inverse multiquadric -15.6098482614 15.7194674906 linear -1.76157336073e-010 1.24949181055e-010 cubic -0.000709860285963 0.018385394661 gaussian -293.930336611 282.058111404 quintic -0.176381494531 5.37780806549 multiquadric -30.9515933446 58.3786105046 thin-plate -7.06755391536e-006 8.71407169821e-005

In percentage. Some of them are still off the mark, but you should have seen them before ;-) .

I'll do some more analysis tomorrow, and if it works I am going to try the bigger profile-over-time interpolation. Thank you so much guys for your suggestions.

If anyone is interested in a follow up, I have tried a time-based interpolation of my oil profile (and gas and gas injection profiles) using those 40 interpolators (and even more, up to 400, one every month of fluid flow simulation time step). I wasn't expecting too much out of it, but when the interpolated profiles came out (for different combinations of input parameters) I felt like being on the wrong side of the Lala River in the valley of Areyoukidding. The results are striking. I get an impressive agreement between this interpolated proxy model and the real simulations, whether I use existing combinations of parameters or new ones (i.e., I create the interpolation and then run the real fluid flow simulation, comparing the outcomes). As an aside, I got my colleagues reservoir engineers playfully complaining that it's time for them to pack their stuff and go home as this interpolator is doing all the job for us; obviously, this misses the point that it took 4 years to build such a comprehensive bunch of simulations which now allows us to somewhat "predict" a possible production profile in advance. I wrapped everything up in a wxPython GUI with some Matplotlib graphs, and everyone seems happy. The only small complain I have is that I wasn't able to come up with a vector implementation of RBFs, so it can be pretty slow to build and interpolate 400 RBFs for each property (3 of them). Thanks to everyone for your valuable suggestions! Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Christopher Barker

9:35 p.m.

Andrea Gavana wrote:

...

...
...
Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try. Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now...

One other thought -- core to much engineering is dimensional analysis -- you know how we like those non-dimensional number! I think this situation is less critical, as you are interpolating, not optimizing or something, but many interpolation methods are built on the idea of some data points being closer than others to your point of interest. Who is to say if a point that is 2 hours away is closer or father than one 2 meters away? This is essentially what you are doing. Scaling everything to the same range is a start, but then you've still given them an implicit weighting. An alternative to to figure out a way to non-dimensionalize your parameters -- that *may* give you a more physically based scaling. And you might invent the "Gavana Number" in the process ;-) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Andrea Gavana

9:45 p.m.

Hi Chris and All, On 29 March 2010 22:35, Christopher Barker wrote:

...

Andrea Gavana wrote:

...
...
...
Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try. Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now...

One other thought -- core to much engineering is dimensional analysis -- you know how we like those non-dimensional number!

I think this situation is less critical, as you are interpolating, not optimizing or something, but many interpolation methods are built on the idea of some data points being closer than others to your point of interest.

Who is to say if a point that is 2 hours away is closer or father than one 2 meters away? This is essentially what you are doing.

Scaling everything to the same range is a start, but then you've still given them an implicit weighting.

An alternative to to figure out a way to non-dimensionalize your parameters -- that *may* give you a more physically based scaling.

And you might invent the "Gavana Number" in the process ;-)

Might be :-D . At the moment I am pretty content with what I have got, it seems to be working fairly well although I didn't examine all the possible cases and it is very likely that my little tool will break disastrously for some combinations of parameters. However, I am not sure I am allowed to post an image comparing the "real" simulation with the prediction of the interpolated proxy model, but if you could see it, you would surely agree that it is a very reasonable approach. It seems to good to be true :-D . Again, this is mainly due to the fact that we have a very extensive set of simulations which cover a wide range of combinations of parameters, so the interpolation itself is only doing its job correctly. I don't think the technique can be applied blindly to whatever oil/gas/condensate /whatever reservoir, as non-linearities in fluid flow simulations appear where you least expect them, but it seems to be working ok (up to now) for our field (which is, by the way, one of the most complex and less understood condensate reservoir out there). Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Brennan Williams

10:13 p.m.

Andrea Gavana wrote:

...

Hi Chris and All,

On 29 March 2010 22:35, Christopher Barker wrote:

...
Andrea Gavana wrote:

...
...
...
Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try.

Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now...

One other thought -- core to much engineering is dimensional analysis -- you know how we like those non-dimensional number!

I think this situation is less critical, as you are interpolating, not optimizing or something, but many interpolation methods are built on the idea of some data points being closer than others to your point of interest.

Who is to say if a point that is 2 hours away is closer or father than one 2 meters away? This is essentially what you are doing.

Scaling everything to the same range is a start, but then you've still given them an implicit weighting.

An alternative to to figure out a way to non-dimensionalize your parameters -- that *may* give you a more physically based scaling.

And you might invent the "Gavana Number" in the process ;-)

Might be :-D . At the moment I am pretty content with what I have got, it seems to be working fairly well although I didn't examine all the possible cases and it is very likely that my little tool will break disastrously for some combinations of parameters. However, I am not sure I am allowed to post an image comparing the "real" simulation with the prediction of the interpolated proxy model, but if you could see it, you would surely agree that it is a very reasonable approach. It seems to good to be true :-D .

Again, this is mainly due to the fact that we have a very extensive set of simulations which cover a wide range of combinations of parameters, so the interpolation itself is only doing its job correctly. I don't think the technique can be applied blindly to whatever oil/gas/condensate /whatever reservoir, as non-linearities in fluid flow simulations appear where you least expect them, but it seems to be working ok (up to now) for our field (which is, by the way, one of the most complex and less understood condensate reservoir out there).

And of course that proxy simulator only deals with the input variables that you decided on 1,000+ simulations ago. All you need is for someone to suggest something else like "how about gas injection?" and you're back to having to do more real simulation runs (which is where a good experimental design comes in). It would be interesting to know how well your proxy simulator compares to the real simulator for a combination of input variable values that is a good distance outside your original parameter space. Brennan

...

Andrea.

"Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <== _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Andrea Gavana

10:32 p.m.

On 29 March 2010 23:13, Brennan Williams wrote:

...

Andrea Gavana wrote:

...
Hi Chris and All,

On 29 March 2010 22:35, Christopher Barker wrote:

...
Andrea Gavana wrote:

...
...
...
Scaling each axis by its standard deviation is a typical first start. Shifting and scaling the values such that they each go from 0 to 1 is another useful thing to try.

Ah, magnifico! Thank you Robert and Friedrich, it seems to be working now...

One other thought -- core to much engineering is dimensional analysis -- you know how we like those non-dimensional number!

I think this situation is less critical, as you are interpolating, not optimizing or something, but many interpolation methods are built on the idea of some data points being closer than others to your point of interest.

Who is to say if a point that is 2 hours away is closer or father than one 2 meters away? This is essentially what you are doing.

Scaling everything to the same range is a start, but then you've still given them an implicit weighting.

An alternative to to figure out a way to non-dimensionalize your parameters -- that *may* give you a more physically based scaling.

And you might invent the "Gavana Number" in the process ;-)

Might be :-D . At the moment I am pretty content with what I have got, it seems to be working fairly well although I didn't examine all the possible cases and it is very likely that my little tool will break disastrously for some combinations of parameters. However, I am not sure I am allowed to post an image comparing the "real" simulation with the prediction of the interpolated proxy model, but if you could see it, you would surely agree that it is a very reasonable approach. It seems to good to be true :-D .

Again, this is mainly due to the fact that we have a very extensive set of simulations which cover a wide range of combinations of parameters, so the interpolation itself is only doing its job correctly. I don't think the technique can be applied blindly to whatever oil/gas/condensate /whatever reservoir, as non-linearities in fluid flow simulations appear where you least expect them, but it seems to be working ok (up to now) for our field (which is, by the way, one of the most complex and less understood condensate reservoir out there).

And of course that proxy simulator only deals with the input variables that you decided on 1,000+ simulations ago.

(1000 < x < 0 = now) . Correct: the next phase of the development for the field has some strict rules we are not allowed to break. The parameters we chose for optimizing this development phase (4 years ago up to now) are the same as the ones I am using for the interpolation. There is no more room for other options.

...

All you need is for someone to suggest something else like "how about gas injection?"

This is already accounted for.

...

and you're back to having to do more real simulation runs (which is where a good experimental design comes in).

Possibly, but as I said we can't even think of doing something like ED now. It's too computationally expensive for such a field. This is why I had the idea of using the existing set of simulations, which are a good ED by themselves (even if we didn't plan for it in advance).

...

It would be interesting to know how well your proxy simulator compares to the real simulator for a combination of input variable values that is a good distance outside your original parameter space.

This is extremely unlikely to happen ever. As I said, we explored a wide range of number/type of possible producers and injectors, plus a fairly extended range of production/injection profiles. As we are dealing with reality here, there are physical limits on what facilities you can install on your fields to try and ramp up production as much as you can (i.e., upper bounds for number of producers/injectors/plateaus), and political limits on how low you can go and still be economical (i.e., lower bounds for number of producers/injectors/plateaus). It seems to me we're covering everything except the most extreme cases. Anyway, I am not here to convince anyone on the validity of the approach: I am a practical person, this thing works reasonably well and I am more than happy. Other than that, we are already going waaaay off topic for the Numpy list. Sorry about that. If you (or anyone else) wishes to continue the discussion on Reservoir Simulation, please feel free to contact me directly. For all other interpolation suggestions, I am always open to new ideas. Thank you again to the list for your help. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Friedrich Romstedt

10:44 p.m.

2010/3/29 Andrea Gavana <andrea.gavana@gmail.com>:

...

If anyone is interested in a follow up, I have tried a time-based interpolation of my oil profile (and gas and gas injection profiles) using those 40 interpolators (and even more, up to 400, one every month of fluid flow simulation time step).

I wasn't expecting too much out of it, but when the interpolated profiles came out (for different combinations of input parameters) I felt like being on the wrong side of the Lala River in the valley of Areyoukidding. The results are striking. I get an impressive agreement between this interpolated proxy model and the real simulations, whether I use existing combinations of parameters or new ones (i.e., I create the interpolation and then run the real fluid flow simulation, comparing the outcomes).

I'm reasoning about the implications of this observation to our understanding of your interpolation. As Christopher pointed out, it's very important to know how many gas injections wells are to be weighted the same as one year. When you have nice results using 40 Rbfs for each time instant, this procedure means that the values for one time instant will not be influenced by adjacent-year data. I.e., you would probably get the same result using a norm extraordinary blowing up the time coordinate. To make it clear in code, when the time is your first coordinate, and you have three other coordinates, the *norm* would be: def norm(x1, x2): return numpy.sqrt((((x1 - x2) * [1e3, 1, 1]) ** 2).sum()) In this case, the epsilon should be fixed, to avoid the influence of the changing distances on the epsilon determination inside of Rbf, which would spoil the whole thing. I have an idea how to tune your model: Take, say, the half or three thirds of your simulation data as interpolation database, and try to reproduce the remaining part. I have some ideas how to tune using this in practice.

...

As an aside, I got my colleagues reservoir engineers playfully complaining that it's time for them to pack their stuff and go home as this interpolator is doing all the job for us; obviously, this misses the point that it took 4 years to build such a comprehensive bunch of simulations which now allows us to somewhat "predict" a possible production profile in advance.

:-) :-)

...

I wrapped everything up in a wxPython GUI with some Matplotlib graphs, and everyone seems happy. Not only your collegues! The only small complain I have is that I wasn't able to come up with a vector implementation of RBFs, so it can be pretty slow to build and interpolate 400 RBFs for each property (3 of them).

Haven't you spoken about 40 Rbfs for the time alone?? Something completely different: Are you going to do more simulations? Friedrich

Andrea Gavana

10:57 p.m.

HI Friedrich & All, On 29 March 2010 23:44, Friedrich Romstedt wrote:

...

2010/3/29 Andrea Gavana <andrea.gavana@gmail.com>:

...
If anyone is interested in a follow up, I have tried a time-based interpolation of my oil profile (and gas and gas injection profiles) using those 40 interpolators (and even more, up to 400, one every month of fluid flow simulation time step).

I wasn't expecting too much out of it, but when the interpolated profiles came out (for different combinations of input parameters) I felt like being on the wrong side of the Lala River in the valley of Areyoukidding. The results are striking. I get an impressive agreement between this interpolated proxy model and the real simulations, whether I use existing combinations of parameters or new ones (i.e., I create the interpolation and then run the real fluid flow simulation, comparing the outcomes).

I'm reasoning about the implications of this observation to our understanding of your interpolation. As Christopher pointed out, it's very important to know how many gas injections wells are to be weighted the same as one year.

When you have nice results using 40 Rbfs for each time instant, this procedure means that the values for one time instant will not be influenced by adjacent-year data. I.e., you would probably get the same result using a norm extraordinary blowing up the time coordinate. To make it clear in code, when the time is your first coordinate, and you have three other coordinates, the *norm* would be:

def norm(x1, x2): return numpy.sqrt((((x1 - x2) * [1e3, 1, 1]) ** 2).sum())

In this case, the epsilon should be fixed, to avoid the influence of the changing distances on the epsilon determination inside of Rbf, which would spoil the whole thing.

I have an idea how to tune your model: Take, say, the half or three thirds of your simulation data as interpolation database, and try to reproduce the remaining part. I have some ideas how to tune using this in practice.

This is a very good idea indeed: I am actually running out of test cases (it takes a while to run a simulation, and I need to do it every time I try a new combination of parameters to check if the interpolation is good enough or rubbish). I'll give it a go tomorrow at work and I'll report back (even if I get very bad results :-D ).

...

...
As an aside, I got my colleagues reservoir engineers playfully complaining that it's time for them to pack their stuff and go home as this interpolator is doing all the job for us; obviously, this misses the point that it took 4 years to build such a comprehensive bunch of simulations which now allows us to somewhat "predict" a possible production profile in advance.

:-) :-)

...
I wrapped everything up in a wxPython GUI with some Matplotlib graphs, and everyone seems happy. Not only your collegues! The only small complain I have is that I wasn't able to come up with a vector implementation of RBFs, so it can be pretty slow to build and interpolate 400 RBFs for each property (3 of them).

Haven't you spoken about 40 Rbfs for the time alone??

Yes, sorry about the confusion: depending on which "time-step" I choose to compare the interpolation with the real simulation, I can have 40 RBFs (1 every year of simulation) or more than 400 (one every month of simulation, not all the monthly data are available for all the simulations I have).

...

Something completely different: Are you going to do more simulations?

110% surely undeniably yes. The little interpolation tool I have is just a proof-of-concept and a little helper for us to have an initial grasp of how the production profiles might look like before actually running the real simulation. Something like a toy to play with (if you can call "play" actually working on a reservoir simulation...). There is no possible substitute for the reservoir simulator itself. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Robert Kern

11:10 p.m.

On Mon, Mar 29, 2010 at 17:57, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...

HI Friedrich & All,

On 29 March 2010 23:44, Friedrich Romstedt wrote:

...

...
Something completely different: Are you going to do more simulations?

110% surely undeniably yes. The little interpolation tool I have is just a proof-of-concept and a little helper for us to have an initial grasp of how the production profiles might look like before actually running the real simulation. Something like a toy to play with (if you can call "play" actually working on a reservoir simulation...). There is no possible substitute for the reservoir simulator itself.

One thing you might want to do is to investigate using Gaussian processes instead of RBFs. They are closely related (and I even think the 'gaussian' RBF corresponds to what you get from a particularly-constructed GP), but you can include uncertainty estimates of your data and get an estimate of the uncertainty of the interpolant. GPs are also very closely related to kriging, which you may also be familiar with. If you are going to be running more simulations, GPs can tell you what new inputs you should simulate to reduce your uncertainty the most. PyMC has some GP code: http://pymc.googlecode.com/files/GPUserGuide.pdf -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

Friedrich Romstedt

30 Mar 30 Mar

8:48 p.m.

2010/3/30 Andrea Gavana <andrea.gavana@gmail.com>:

...

On 29 March 2010 23:44, Friedrich Romstedt wrote:

...
When you have nice results using 40 Rbfs for each time instant, this procedure means that the values for one time instant will not be influenced by adjacent-year data. I.e., you would probably get the same result using a norm extraordinary blowing up the time coordinate. To make it clear in code, when the time is your first coordinate, and you have three other coordinates, the *norm* would be:

def norm(x1, x2): return numpy.sqrt((((x1 - x2) * [1e3, 1, 1]) ** 2).sum())

In this case, the epsilon should be fixed, to avoid the influence of the changing distances on the epsilon determination inside of Rbf, which would spoil the whole thing.

Of course, it are here two and not three "other variables."

...

...
I have an idea how to tune your model: Take, say, the half or three thirds of your simulation data as interpolation database, and try to reproduce the remaining part. I have some ideas how to tune using this in practice.

Here, of course it are three quarters and not three thirds :-)

...

This is a very good idea indeed: I am actually running out of test cases (it takes a while to run a simulation, and I need to do it every time I try a new combination of parameters to check if the interpolation is good enough or rubbish). I'll give it a go tomorrow at work and I'll report back (even if I get very bad results :-D ).

I refined the idea a bit. Select one simulation, and use the complete rest as the interpolation base. Then repreat this for each simualation. Calculate some joint value for all the results, the simplest would maybe be, to calculate: def joint_ln_density(simulation_results, interpolation_results): return -((interpolation_results - simulation_results) ** 2) / (simulation_results ** 2) In fact, this calculates the logarithm of the Gaussians centered at *simulation_results* and taken at the "obervations" *interpolation_results*. It is the logarithms of the product of this Gaussians. The standard deviation of the Gaussians is assumed to be the value of the *simulation_results*, which means, that I assume that low-valued outcomes are much more precise in absolute numbers than high-outcome values, but /relative/ to their nominal value they are all the same precise. (NB: A scaling of the stddevs wouldn't make a significant difference /for you/. Same the neglected coefficients of the Gaussians.) I don't know, which method you like the most. Robert's and Kevin's proposals are hard to compete with ... You could optimise (maximise) the joint_ln_density outcome as a function of *epsilon* and the different scalings. afaic, scipy comes with some optimisation algorithms included. I checked it: http://docs.scipy.org/doc/scipy-0.7.x/reference/optimize.html#general-purpos... . Friedrich

Andrea Gavana

9:09 p.m.

Hi Friedrich & All, On 30 March 2010 21:48, Friedrich Romstedt wrote:

...

2010/3/30 Andrea Gavana <andrea.gavana@gmail.com>:

...
On 29 March 2010 23:44, Friedrich Romstedt wrote:

...
When you have nice results using 40 Rbfs for each time instant, this procedure means that the values for one time instant will not be influenced by adjacent-year data. I.e., you would probably get the same result using a norm extraordinary blowing up the time coordinate. To make it clear in code, when the time is your first coordinate, and you have three other coordinates, the *norm* would be:

def norm(x1, x2): return numpy.sqrt((((x1 - x2) * [1e3, 1, 1]) ** 2).sum())

In this case, the epsilon should be fixed, to avoid the influence of the changing distances on the epsilon determination inside of Rbf, which would spoil the whole thing.

Of course, it are here two and not three "other variables."

...
...
I have an idea how to tune your model: Take, say, the half or three thirds of your simulation data as interpolation database, and try to reproduce the remaining part. I have some ideas how to tune using this in practice.

Here, of course it are three quarters and not three thirds :-)

...
This is a very good idea indeed: I am actually running out of test cases (it takes a while to run a simulation, and I need to do it every time I try a new combination of parameters to check if the interpolation is good enough or rubbish). I'll give it a go tomorrow at work and I'll report back (even if I get very bad results :-D ).

I refined the idea a bit. Select one simulation, and use the complete rest as the interpolation base. Then repreat this for each simualation. Calculate some joint value for all the results, the simplest would maybe be, to calculate:

def joint_ln_density(simulation_results, interpolation_results): return -((interpolation_results - simulation_results) ** 2) / (simulation_results ** 2)

In fact, this calculates the logarithm of the Gaussians centered at *simulation_results* and taken at the "obervations" *interpolation_results*. It is the logarithms of the product of this Gaussians. The standard deviation of the Gaussians is assumed to be the value of the *simulation_results*, which means, that I assume that low-valued outcomes are much more precise in absolute numbers than high-outcome values, but /relative/ to their nominal value they are all the same precise. (NB: A scaling of the stddevs wouldn't make a significant difference /for you/. Same the neglected coefficients of the Gaussians.)

I don't know, which method you like the most. Robert's and Kevin's proposals are hard to compete with ...

You could optimise (maximise) the joint_ln_density outcome as a function of *epsilon* and the different scalings. afaic, scipy comes with some optimisation algorithms included. I checked it: http://docs.scipy.org/doc/scipy-0.7.x/reference/optimize.html#general-purpos...

I had planned to show some of the results based on the suggestion you gave me yesterday: I took two thirds ( :-D ) of the simulations database to use them as interpolation base and tried to reproduce the rest using the interpolation. Unfortunately it seems like my computer at work has blown up (maybe a BSOD, I was doing waaaay too many heavy things at once) and I can't access it from home at the moment. I can't show the real field profiles, but at least I can show you how good or bad the interpolation performs (in terms of relative errors), and I was planning to post a matplotlib composite graph to do just that. I am still hopeful my PC will resurrect at some point :-D However, from the first 100 or so interpolated simulations, I could gather these findings: 1) Interpolations on *cumulative* productions on oil and gas are extremely good, with a maximum range of relative error of -3% / +2%: most of them (95% more or less) show errors < 1%, but for few of them I get the aforementioned range of errors in the interpolations; 2) Interpolations on oil and gas *rates* (time dependent), I usually get a -5% / +3% error per timestep, which is very good for my purposes. I still need to check these values but the first set of results were very promising; 3) Interpolations on gas injection (cumulative and rate) are a bit more shaky (15% error more or less), but this is essentially due to a particular complex behaviour of the reservoir simulator when it needs to decide (based on user input) if the gas is going to be re-injected, sold, used as excess gas and a few more; I am not that worried about this issue for the moment. I'll be off for a week for Easter (I'll leave for Greece in few hours), but I am really looking forward to try the suggestion Kevin posted and to investigate Robert's idea: I know about kriging, but I initially thought it wouldn't do a good job in this case. I'll reconsider for sure. And I'll post a screenshot of the results as soon as my PC get out of the Emergency Room. Thank you again! Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Friedrich Romstedt

9:44 p.m.

2010/3/30 Andrea Gavana <andrea.gavana@gmail.com>:

...

However, from the first 100 or so interpolated simulations, I could gather these findings:

1) Interpolations on *cumulative* productions on oil and gas are extremely good, with a maximum range of relative error of -3% / +2%: most of them (95% more or less) show errors < 1%, but for few of them I get the aforementioned range of errors in the interpolations; 2) Interpolations on oil and gas *rates* (time dependent), I usually get a -5% / +3% error per timestep, which is very good for my purposes. I still need to check these values but the first set of results were very promising; 3) Interpolations on gas injection (cumulative and rate) are a bit more shaky (15% error more or less), but this is essentially due to a particular complex behaviour of the reservoir simulator when it needs to decide (based on user input) if the gas is going to be re-injected, sold, used as excess gas and a few more; I am not that worried about this issue for the moment.

Have a nice time in Greece, and what you write makes me laughing. :-) When you are back, you should maybe elaborate a bit on what gas injections, wells, re-injected gas and so on is, I don't know about it. Friedrich

Andrea Gavana

8 Apr 8 Apr

8:08 p.m.

(Resending as numpy-discussion has a 40 Kb message limit) On 30 March 2010 22:44, Friedrich Romstedt <friedrichromstedt@gmail.com> wrote:

...

2010/3/30 Andrea Gavana <andrea.gavana@gmail.com>:

...
However, from the first 100 or so interpolated simulations, I could gather these findings:

1) Interpolations on *cumulative* productions on oil and gas are extremely good, with a maximum range of relative error of -3% / +2%: most of them (95% more or less) show errors < 1%, but for few of them I get the aforementioned range of errors in the interpolations; 2) Interpolations on oil and gas *rates* (time dependent), I usually get a -5% / +3% error per timestep, which is very good for my purposes. I still need to check these values but the first set of results were very promising; 3) Interpolations on gas injection (cumulative and rate) are a bit more shaky (15% error more or less), but this is essentially due to a particular complex behaviour of the reservoir simulator when it needs to decide (based on user input) if the gas is going to be re-injected, sold, used as excess gas and a few more; I am not that worried about this issue for the moment.

Have a nice time in Greece, and what you write makes me laughing. :-) When you are back, you should maybe elaborate a bit on what gas injections, wells, re-injected gas and so on is, I don't know about it.

Just as a little follow-up, I managed to obtain a more extended set of interpolated simulations yesterday. As a little recap, I took two thirds of the simulations database to use them as interpolation base and tried to reproduce the rest using the interpolation. I have a couple of plots to demonstrate how good/bad the interpolation was. 1) The first one (http://img404.yfrog.com/img404/9885/doterrors.png) has 6 subplots: the first 3 at the top show relative errors (in percentage) for cumulative productions for oil and gas and cumulative gas injection, with: relative_error = 100 * (real - interpolated) / real The second set of 3 subplots at the bottom show median errors on oil/gas production rates and gas injection rates: these are series of time-dependent values (in contrast to the cumulative, which are a single floating point value), so I took the median error over all the simulated timesteps. As you can see, about 95-96% of the interpolated results have a relative error between -5% / +5%, with some outliers (which don't bother me that much anyway :-D ). 2) The second figure (http://img338.imageshack.us/img338/7527/boxploterrors.png) has been liberally adapted from a matplotlib demo (boxplot_demo2), and show a series of error box plots for all the variables I have used in the interpolation. Overall, I can say I am pretty satisfied with the results :-D . If I get a chance to test Kevin's and Robert's ideas before getting sent to Kazakhstan I'll post another follow-up, unless you got bored by now and you tell me to shut up. Thank you again to the list for the wonderful help. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

Brennan Williams

28 Mar 28 Mar

11:46 p.m.

Andrea Gavana wrote:

...

Hi Friedrich & All,

On 28 March 2010 23:51, Friedrich Romstedt wrote:

...
2010/3/28 Andrea Gavana <andrea.gavana@gmail.com>:

...
Example 1

# o2 and o3 are the number of production wells, split into 2 # different categories # inj is the number of injection wells # fomts is the final oil recovery

rbf = Rbf(oilPlateau, gasPlateau, gasInjPlateau, o2, o3, inj, fomts)

op = [50380] gp = [103014000] gi = [53151000] o2w = [45] o3w = [20] inw = [15]

fi = rbf(op, gp, gi, o2w, o3w, inw)

# I => KNOW <= the answer to be close to +3.5e8

print fi

[ -1.00663296e+08]

(yeah right...)

Example 2

Changing o2w from 45 to 25 (again, the answer should be close to 3e8, less wells => less production)

fi = rbf(op, gp, gi, o2w, o3w, inw)

print fi

[ 1.30023424e+08]

And keep in mind, that nowhere I have such low values of oil recovery in my data... the lowest one are close to 2.8e8...

I want to put my2 cents in, fwiw ...

What I see from http://docs.scipy.org/doc/scipy-0.7.x/reference/generated/scipy.interpolate.... are three things:

1. Rbf uses some weighting based on the radial functions. 2. Rbf results go through the nodal points without *smooth* set to some value != 0 3. Rbf is isotropic

(3.) is most important. I see from your e-mail that the values you pass in to Rbf are of very different order of magnitude. But the default norm used in Rbf is for sure isotropic, i.e., it will result in strange and useless "mean distances" in R^N where there are N parameters. You have to either pass in a *norm* which weights the coords according to their extent, or to scale the data such that the aspect ratios of the hypecube's edges are sensible.

You can imagine it as the follwing ascii plot:

* * * * * * *

the x dimension is say the gas plateau dimension (~1e10), the y dimension is the year dimension (~1e1). In an isotropic plot, using the data lims and aspect = 1, they may be well homogenous, but on this scaling, as used by Rbf, they are lumped. I don't know if it's clear what I mean from my description?

Are the corresponding parameter values spread completely randomly, or is there always varied only one axis?

If random, you could try a fractal analysis to find out the optimal scaling of the axes for N-d coverage of N-d space. In the case above, is is something between points and lines, I guess. When scaling it properly, it becomes a plane-like coveage of the xy plane. When scaling further, it will become points again (lumped together). So you can find an optimum. There are quantitative methods to obtain the fractal dimension, and they are quite simple.

I believe I need a technical dictionary to properly understand all that... :-D . Sorry, I am no expert at all, really, just an amateur with some imagination, but your suggestion about the different magnitude of the matrix is a very interesting one. Although I have absolutely no idea on how to re-scale them properly to avoid RBFs going crazy.

As for your question, the parameter are not spread completely randomly, as this is a collection of simulations done over the years, trying manually different scenarios, without having in mind a proper experimental design or any other technique. Nor the parameter values vary only on one axis in each simulation (few of them are like that).

I assume that there is a default "norm" that calculates the distance between points irrespective of the order of the input coordinates? So if that isn't working, leading to the spurious results, the next step is to normalise all the inputs so they are in the same range, e.g max-min=1.0 On a related note, what approach would be best if one of the input parameters wasn't continuous? e.g. I have three quite different geological distributions called say A,B and C. SO some of my simulations use distribution A, some use B and some use C. I could assign them the numbers 1,2,3 but a value of 1.5 is meaningless. Andrea, if you have 1TB of data for 1,000 simulation runs, then, if I assume you only mean the smspec/unsmry files, that means each of your summary files is 1GB in size? Are those o2w,o3w and inw figures the number of new wells only or existing+new? It's fun dealing with this amount of data isn't it? Brennan

...

Andrea.

"Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <== _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Andrea Gavana

29 Mar 29 Mar

9:31 p.m.

Hi Brennan & All, On 29 March 2010 00:46, Brennan Williams wrote:

...

Andrea Gavana wrote:

...
As for your question, the parameter are not spread completely randomly, as this is a collection of simulations done over the years, trying manually different scenarios, without having in mind a proper experimental design or any other technique. Nor the parameter values vary only on one axis in each simulation (few of them are like that).

I assume that there is a default "norm" that calculates the distance between points irrespective of the order of the input coordinates?

So if that isn't working, leading to the spurious results, the next step is to normalise all the inputs so they are in the same range, e.g max-min=1.0

Scaling the input data using their standard deviation worked very well for my case.

...

On a related note, what approach would be best if one of the input parameters wasn't continuous? e.g. I have three quite different geological distributions called say A,B and C. SO some of my simulations use distribution A, some use B and some use C. I could assign them the numbers 1,2,3 but a value of 1.5 is meaningless.

Not sure about this: I do have integer numbers too (the number of wells can not be a fractional one, obviously), but I don't care about it as it is an input parameter (i.e., the user choose how many o2/o3/injector wells he/she wants, and I get an interpolated production profiles). Are you saying that the geological realization is one of your output variables?

...

Andrea, if you have 1TB of data for 1,000 simulation runs, then, if I assume you only mean the smspec/unsmry files, that means each of your summary files is 1GB in size?

It depends on the simulation, and also for how many years the forecast is run. Standard runs go up to 2038, but we have a bunch of them running up to 2120 (!) . As we do have really many wells in this field, the ECLIPSE summary file dimensions skyrocket pretty quickly.

...

Are those o2w,o3w and inw figures the number of new wells only or existing+new? It's fun dealing with this amount of data isn't it?

They're only new wells, with a range of 0 <= o2w <= 150 and 0 <= o3 <= 84 and 0 <= inw <= 37, and believe it or not, our set of simulations contains a lot of the possible combinations for these 2 variables (and the other 4 variables too)... Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

josef.pktd＠gmail.com

28 Mar 28 Mar

9:18 p.m.

On Sun, Mar 28, 2010 at 4:47 PM, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...

HI All,

On 28 March 2010 19:22, Robert Kern wrote:

...
On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...
On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40).

If I understand your problem correctly, you have a function taking one value as input (or one 3D vector) and returning a vector of length 40. You want to know whether there are tools in scipy to support this.

I'll say first that it's not strictly necessary for there to be: you could always just build 40 different interpolators, one for each component of the output. After all, there's no interaction in the calculations between the output coordinates. This is of course awkward, in that you'd like to just call F(x) and get back a vector of length 40, but that can be remedied by writing a short wrapper function that simply calls all 40 interpolators.

A problem that may be more serious is that the python loop over the 40 interpolators can be slow, while a C implementation would give vector-valued results rapidly.

40 is not a bad number to loop over. The thing I would be worried about is the repeated calculation of the (1000, 1000) radial function evaluation. I think that a small modification of Rbf could be made to handle the vector-valued case. I leave that as an exercise to Andrea, of course. :-)

It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these:

# Interpolate the cumulative oil production rbf = Rbf(x1, x2, x3, x4, x5, x6, final_oil_recovery)

# Try to use existing combination of parameters to get back # the original result (more or less) possible_recovery = rbf(x1[10], x2[10], x3[10], x4[10], x5[10], x6[10])

1) If I attempt to use the resulting interpolation function ("rbf"), inputting a single value for each x1, x2, ..., x6 that is *already present* in the original x1, x2, ..., x6 vectors, I get meaningless results (i.e., negative oil productions, 1000% error, and so on) in all cases with some (rare) exceptions when using the "thin-plate" interpolation option; 2) Using a combination of parameters x1, x2, ..., x6 that is *not* in the original set, I get non-physical (i.e., unrealistic) results: it appears that RBFs think that if I increase the number of production wells I get less oil recovered (!), sometimes by a factor of 10.

I am pretty sure I am missing something in this whole interpolation thing (I am no expert at all), but honestly it is the first time I get such counter-intuitive results in my entire Python numerical life ;-)

The interpolation with a large number of points can be pretty erratic. Did you use all 1000 points in the RBF? Do see what's going on you could try 2 things: 1) Use only a few points (10 to 100) 2) Use gaussian with a large negative smooth I had problems in the past with rbf in examples, and in one case I switched to using neighborhood points only (e.g. with scipy.spatial, KDTree) Josef

...

Andrea.

"Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <== _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Brennan Williams

9:50 p.m.

Andrea Gavana wrote:

...

HI All,

On 28 March 2010 19:22, Robert Kern wrote:

...
On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...
On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40).

If I understand your problem correctly, you have a function taking one value as input (or one 3D vector) and returning a vector of length 40. You want to know whether there are tools in scipy to support this.

I'll say first that it's not strictly necessary for there to be: you could always just build 40 different interpolators, one for each component of the output. After all, there's no interaction in the calculations between the output coordinates. This is of course awkward, in that you'd like to just call F(x) and get back a vector of length 40, but that can be remedied by writing a short wrapper function that simply calls all 40 interpolators.

A problem that may be more serious is that the python loop over the 40 interpolators can be slow, while a C implementation would give vector-valued results rapidly.

40 is not a bad number to loop over. The thing I would be worried about is the repeated calculation of the (1000, 1000) radial function evaluation. I think that a small modification of Rbf could be made to handle the vector-valued case. I leave that as an exercise to Andrea, of course. :-)

It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these:

# Interpolate the cumulative oil production rbf = Rbf(x1, x2, x3, x4, x5, x6, final_oil_recovery)

# Try to use existing combination of parameters to get back # the original result (more or less) possible_recovery = rbf(x1[10], x2[10], x3[10], x4[10], x5[10], x6[10])

1) If I attempt to use the resulting interpolation function ("rbf"), inputting a single value for each x1, x2, ..., x6 that is *already present* in the original x1, x2, ..., x6 vectors, I get meaningless results (i.e., negative oil productions, 1000% error, and so on) in all cases with some (rare) exceptions when using the "thin-plate" interpolation option; 2) Using a combination of parameters x1, x2, ..., x6 that is *not* in the original set, I get non-physical (i.e., unrealistic) results: it appears that RBFs think that if I increase the number of production wells I get less oil recovered (!), sometimes by a factor of 10.

I am pretty sure I am missing something in this whole interpolation thing (I am no expert at all), but honestly it is the first time I get such counter-intuitive results in my entire Python numerical life ;-)

Hi Andrea, I'm looking at doing this as well, i.e. using RBF to create what is effectively a "proxy simulator". I'm going to have a read through the scipy rbf documentation to get up to speed. I take it you are using an experimental design algorithm such as Plackett-Burman or similar? Brennan

...

Andrea.

"Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/

==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <== _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Andrea Gavana

10:02 p.m.

HI Brennan, On 28 March 2010 22:50, Brennan Williams wrote:

...

Andrea Gavana wrote:

...
HI All,

On 28 March 2010 19:22, Robert Kern wrote:

...
On Sun, Mar 28, 2010 at 03:26, Anne Archibald <peridot.faceted@gmail.com> wrote:

...
On 27 March 2010 20:24, Andrea Gavana <andrea.gavana@gmail.com> wrote:

...
Hi All,

I have an interpolation problem and I am having some difficulties in tackling it. I hope I can explain myself clearly enough.

Basically, I have a whole bunch of 3D fluid flow simulations (close to 1000), and they are a result of different combinations of parameters. I was planning to use the Radial Basis Functions in scipy, but for the moment let's assume, to simplify things, that I am dealing only with one parameter (x). In 1000 simulations, this parameter x has 1000 values, obviously. The problem is, the outcome of every single simulation is a vector of oil production over time (let's say 40 values per simulation, one per year), and I would like to be able to interpolate my x parameter (1000 values) against all the simulations (1000x40) and get an approximating function that, given another x parameter (of size 1x1) will give me back an interpolated production profile (of size 1x40).

If I understand your problem correctly, you have a function taking one value as input (or one 3D vector) and returning a vector of length 40. You want to know whether there are tools in scipy to support this.

I'll say first that it's not strictly necessary for there to be: you could always just build 40 different interpolators, one for each component of the output. After all, there's no interaction in the calculations between the output coordinates. This is of course awkward, in that you'd like to just call F(x) and get back a vector of length 40, but that can be remedied by writing a short wrapper function that simply calls all 40 interpolators.

A problem that may be more serious is that the python loop over the 40 interpolators can be slow, while a C implementation would give vector-valued results rapidly.

40 is not a bad number to loop over. The thing I would be worried about is the repeated calculation of the (1000, 1000) radial function evaluation. I think that a small modification of Rbf could be made to handle the vector-valued case. I leave that as an exercise to Andrea, of course. :-)

It seems like this whole interpolation stuff is not working as I thought. In particular, considering scalar-valued interpolation (i.e., looking at the final oil recovery only and not the time-based oil production profile), interpolation with RBFs is giving counter-intuitive and meaningless answers. The issues I am seeing are basically these:

# Interpolate the cumulative oil production rbf = Rbf(x1, x2, x3, x4, x5, x6, final_oil_recovery)

# Try to use existing combination of parameters to get back # the original result (more or less) possible_recovery = rbf(x1[10], x2[10], x3[10], x4[10], x5[10], x6[10])

1) If I attempt to use the resulting interpolation function ("rbf"), inputting a single value for each x1, x2, ..., x6 that is *already present* in the original x1, x2, ..., x6 vectors, I get meaningless results (i.e., negative oil productions, 1000% error, and so on) in all cases with some (rare) exceptions when using the "thin-plate" interpolation option; 2) Using a combination of parameters x1, x2, ..., x6 that is *not* in the original set, I get non-physical (i.e., unrealistic) results: it appears that RBFs think that if I increase the number of production wells I get less oil recovered (!), sometimes by a factor of 10.

I am pretty sure I am missing something in this whole interpolation thing (I am no expert at all), but honestly it is the first time I get such counter-intuitive results in my entire Python numerical life ;-)

Hi Andrea, I'm looking at doing this as well, i.e. using RBF to create what is effectively a "proxy simulator". I'm going to have a read through the scipy rbf documentation to get up to speed. I take it you are using an experimental design algorithm such as Plackett-Burman or similar?

No, it's simply a collection of all the simulations we have run in the past 3-4 years. We can't afford to use experimental design or latin hypercube algorithms in our simulations, as the field we are studying is so immense that simulation times are prohibitive for any kind of proxy model. What I was trying to do, is to collect all the information and data we gathered with so much effort in the past 4 years and then build some kind of approximating function that would give us a possible production profile without actually running the simulation itself. I should mention that, given any combination of these 6 parameters, any one of us reservoir engineers could draw a production profile for this field by hand (being blinded too, if you wish) without any effort, because the simulated response of this field is so linear any time you change any of the 6 parameters i mentioned in my other posts. This is why I am so surprised about the behaviour of RBFs. I am going to give Matlab (sigh...) a go tomorrow morning and see what griddatan thinks about this issue. I really don't want to go back to Matlab, it has been the worst of the nightmares for a GUI-builder person like me 6 years ago. I switched to Python/Numpy/Scipy/wxPython and I have never looked back. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ ==> Never *EVER* use RemovalGroup for your house removal. You'll regret it forever. http://thedoomedcity.blogspot.com/2010/03/removal-group-nightmare.html <==

5295

Age (days ago)

5306

Last active (days ago)

List overview

Download

32 comments

8 participants

participants (8)

Andrea Gavana
Anne Archibald
Brennan Williams
Christopher Barker
Friedrich Romstedt
josef.pktd＠gmail.com
Pierre GM
Robert Kern

Interpolation question

Pierre GM

Brennan Williams

Brennan Williams

Brennan Williams

Brennan Williams

Brennan Williams

tags

participants (8)