Linear Interpolation Question
Hi All, I have 2 matrices coming from 2 different simulations: the first column of the matrices is a date (time) at which all the other results in the matrix have been reported (simulation step). In these 2 matrices, very often the simulation steps do not coincide, so I just want to interpolate the results in the second matrix using the dates in the first matrix. The problem is, I have close to 13,000 columns in every matrices, and repeating interp1d all over the columns is quite expensive. An example of what I am doing is as follows: # Loop over all the columns for indx in indices: # Set up a linear interpolation with: # x = dates in the second simulation # y = single column in the second matrix simulation function = interp1d(secondaryMatrixDates, secondaryMatrixResults[:, indx], kind='linear') # Interpolate the second matrix results using the first simulation dates interpolationResults = function(mainMatrixDates) # I need the difference between the first simulation and the second newMatrix[:, indx] = mainMatrixResults[:, indx] - interpolationResults This is somehow a costly step, as it's taking up a lot of CPU (increasing at every iteration) and quite a long time (every column has about 350 data). Is there anything I can do to speed up this loop? Or may someone suggest a better approach? Thank you very much for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/
Hi All, On Mon, Apr 28, 2008 at 12:41 PM, Andrea Gavana wrote:
Hi All,
I have 2 matrices coming from 2 different simulations: the first column of the matrices is a date (time) at which all the other results in the matrix have been reported (simulation step). In these 2 matrices, very often the simulation steps do not coincide, so I just want to interpolate the results in the second matrix using the dates in the first matrix. The problem is, I have close to 13,000 columns in every matrices, and repeating interp1d all over the columns is quite expensive. An example of what I am doing is as follows:
# Loop over all the columns for indx in indices:
# Set up a linear interpolation with: # x = dates in the second simulation # y = single column in the second matrix simulation function = interp1d(secondaryMatrixDates, secondaryMatrixResults[:, indx], kind='linear')
# Interpolate the second matrix results using the first simulation dates interpolationResults = function(mainMatrixDates)
# I need the difference between the first simulation and the second newMatrix[:, indx] = mainMatrixResults[:, indx] - interpolationResults
This is somehow a costly step, as it's taking up a lot of CPU (increasing at every iteration) and quite a long time (every column has about 350 data). Is there anything I can do to speed up this loop? Or may someone suggest a better approach?
Thank you very much for your suggestions.
Ok, I have tried to be smart and use interp2d, but interp2d gives me a strange error message which I can't understand: D:\MyProjects>Interp2DSample.py Traceback (most recent call last): File "D:\MyProjects\Interp2DSample.py", line 25, in <module> function = interp2d(xx, yy, z, kind="linear", copy=False) File "C:\Python25\lib\site-packages\scipy\interpolate\interpolate.py", line 91, in __init__ self.tck = fitpack.bisplrep(self.x, self.y, self.z, kx=kx, ky=ky, s=0.) File "C:\Python25\lib\site-packages\scipy\interpolate\fitpack.py", line 677, in bisplrep tx,ty,nxest,nyest,wrk,lwrk1,lwrk2) OverflowError: long int too large to convert to int I am able to get this error message using this simple script: import datetime import numpy from scipy.interpolate import interp2d date1, date2 = [], [] numColumns = 13000 for year in xrange(2007, 2038): for month in xrange(1, 13): date1.append(datetime.date(year, month, 1).toordinal()) date2.append(datetime.date(year, month, 5).toordinal()) timeSteps = len(date2) x = [date1[0] for i in xrange(numColumns)] y = date1 z = numpy.random.rand(timeSteps, numColumns) xx, yy = numpy.meshgrid(x, y) newX = [date2[0] for i in xrange(numColumns)] newY = date2 function = interp2d(xx, yy, z, kind="linear", copy=False) newZ = function(newX, newY) Does anyone know what I am doing wrong? I am on Windows XP, Python 2.5, scipy 0.5.2.1, numpy 1.0.3.1. Thank you very much for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/
Dear All, I have a little problem about positioning widgets in the Pwm package. I want to design a GUI made up of many entry and label widgets together with a graph area all on same window. I have been used to making use of the grid method found in tkinter for positioning. Is there any methhod equivalent to this in the Pmw package that one can use to control the appearance of these widgets? I will be happy if anyone can help. sincerely, Nate
2008/4/28 Andrea Gavana <andrea.gavana@gmail.com>:
I have 2 matrices coming from 2 different simulations: the first column of the matrices is a date (time) at which all the other results in the matrix have been reported (simulation step). In these 2 matrices, very often the simulation steps do not coincide, so I just want to interpolate the results in the second matrix using the dates in the first matrix. The problem is, I have close to 13,000 columns in every matrices, and repeating interp1d all over the columns is quite expensive. An example of what I am doing is as follows:
# Loop over all the columns for indx in indices:
# Set up a linear interpolation with: # x = dates in the second simulation # y = single column in the second matrix simulation function = interp1d(secondaryMatrixDates, secondaryMatrixResults[:, indx], kind='linear')
# Interpolate the second matrix results using the first simulation dates interpolationResults = function(mainMatrixDates)
# I need the difference between the first simulation and the second newMatrix[:, indx] = mainMatrixResults[:, indx] - interpolationResults
This is somehow a costly step, as it's taking up a lot of CPU (increasing at every iteration) and quite a long time (every column has about 350 data). Is there anything I can do to speed up this loop? Or may someone suggest a better approach?
You have run into an unfortunate limitation of interp1d; it only handles scalar-valued data. That python loop, through all those interp1d objects, is pretty wasteful. Since you have only several hundred values to interpolate to, and thirteen thousand columns, I would write a vectorized linear interpolation by hand. That is, for each date in the main matrix, use searchsorted() to find it in the secondary matrix dates, then do something like for j in num_dates: date = main_matrix[j] i = searchsorted(date,secondary_date) # check the docstring t = (date-secondary_date[i])/(secondary_date[i+1]-secondary_date[i]) new_matrix[j,:] = t*secondary_matrix[i,:]+(1-t)*secondary_matrix[i+1,:] Good luck, Anne With 13000 columns, the overhead
participants (3)
-
Andrea Gavana -
Anne Archibald -
Nathaniel Egwu