Review request: interp2d with partial redgrid backend
Hi All, It has been a long time since I submitted any scipy hacks, but I got hit by an old issue of mine so decided to do something about it. The current interp2d backend uses the surfit routine from dierkx's fitpack, which is not meant for interpolation (which we choose when we set s=0.0). From the surfit.f code: c to choose s very small is strongly discouraged. this considerably c increases computation time and memory requirements. it may also c cause rank-deficiency (ier<-2) and endager numerical stability. I originally raised this a long time ago with ticket #286 which proposed replacing surfit with regrid which does not suffer from this problem. However as is rightly stated in the comments to that ticket, it changed interp2d to only work on rectangular grids rather than scattered data which is not acceptable (and an API breakage). So I propose to find a step by step solution to this. First the following code uses regrid whenever possible: https://github.com/jtravs/scipy/compare/master...interp2d_rectangular_fixes This fixes #286, #776, #898 and parts of #1364, #1072, and #703. Following this I hope to improve the docs a little and find a better solution to the scattered data problem rather than using surfit (which is great for smoothing BTW). Please note that this is my first attempt at the whole git workflow thing for scipy, so I hope this code review request is the right way to go. I hope to submit quite a few more changes, so please correct me where I go wrong! Cheers, John
Hi, 12.11.2012 01:54, John Travers kirjoitti: [clip]
So I propose to find a step by step solution to this. First the following code uses regrid whenever possible:
https://github.com/jtravs/scipy/compare/master...interp2d_rectangular_fixes
This fixes #286, #776, #898 and parts of #1364, #1072, and #703.
Looks like an useful improvement to me. I think you can already make it a pull request by clicking the button on Github, so that we can more easily discuss minor code details there. Cheers, Pauli
On 12/11/12 20:15, Pauli Virtanen wrote:
Hi,
12.11.2012 01:54, John Travers kirjoitti: [clip]
So I propose to find a step by step solution to this. First the following code uses regrid whenever possible:
https://github.com/jtravs/scipy/compare/master...interp2d_rectangular_fixes
This fixes #286, #776, #898 and parts of #1364, #1072, and #703.
Looks like an useful improvement to me. I think you can already make it a pull request by clicking the button on Github, so that we can more easily discuss minor code details there.
OK, I did this: https://github.com/scipy/scipy/pull/353 Cheers, J
12.11.2012 01:54, John Travers kirjoitti: [clip]
Following this I hope to improve the docs a little and find a better solution to the scattered data problem rather than using surfit (which is great for smoothing BTW).
Currently, we have the Delaunay tesselation based interpolation routines (LinearNDInterpolation et al.) and RBF, in addition to Fitpack's splines. However, the tesselation doesn't scale very well to large datasets in high dimensions as the number of simplices explodes, and our RBF implementation would need some fine tuning (i.e. the automatic parameter choices it makes are not optimal). Fitpack's problem are well known. So there certainly would be some room for improvement here. We'd also need an easy-to-use gridded data intepolation routine. Tensor product interpolation is sort of easy [1], but I didn't immediately see an efficient and easy way to evaluate z(i) = interpolator(x(i), y(i)) [as opposed to z(i,j) = interpolator(x(i), y(j))] in that way. To do this, one probably would have to really construct the spline representation rather than just reusing existing interpolators one after another. Best, Pauli [1] http://stackoverflow.com/questions/13333265/vector-array-output-from-scipy-n...
On Mon, Nov 12, 2012 at 12:27 PM, Pauli Virtanen <pav@iki.fi> wrote:
12.11.2012 01:54, John Travers kirjoitti: [clip]
Following this I hope to improve the docs a little and find a better solution to the scattered data problem rather than using surfit (which is great for smoothing BTW).
Currently, we have the Delaunay tesselation based interpolation routines (LinearNDInterpolation et al.) and RBF, in addition to Fitpack's splines.
However, the tesselation doesn't scale very well to large datasets in high dimensions as the number of simplices explodes, and our RBF implementation would need some fine tuning (i.e. the automatic parameter choices it makes are not optimal). Fitpack's problem are well known. So there certainly would be some room for improvement here.
We'd also need an easy-to-use gridded data intepolation routine. Tensor product interpolation is sort of easy [1], but I didn't immediately see an efficient and easy way to evaluate z(i) = interpolator(x(i), y(i)) [as opposed to z(i,j) = interpolator(x(i), y(j))] in that way. To do this, one probably would have to really construct the spline representation rather than just reusing existing interpolators one after another.
I haven't looked at the spline case, but for the multidimensional numpy polynomials there is a 'tensor' keyword in the evaluation that lets it be used for both cases. Chuck
On 12/11/12 20:27, Pauli Virtanen wrote:
12.11.2012 01:54, John Travers kirjoitti: [clip]
Following this I hope to improve the docs a little and find a better solution to the scattered data problem rather than using surfit (which is great for smoothing BTW).
Currently, we have the Delaunay tesselation based interpolation routines (LinearNDInterpolation et al.) and RBF, in addition to Fitpack's splines.
However, the tesselation doesn't scale very well to large datasets in high dimensions as the number of simplices explodes, and our RBF implementation would need some fine tuning (i.e. the automatic parameter choices it makes are not optimal). Fitpack's problem are well known. So there certainly would be some room for improvement here.
OK, that is interesting, I was looking at using the NDInterpolation routines for this. I think the rectangular grid is the most common use case of interp2d, so if we can get that working much better it is a good start. Irregular data is the sort of problem where it should be expected to try a few of the different routines available to get something which works well. Cheers, J
participants (3)
-
Charles R Harris
-
John Travers
-
Pauli Virtanen