N-dimensional interpolation
Hi all, I took the Qhull by the horns, and wrote a straightforward `griddata` implementation for working in N-D: http://github.com/pv/scipy-work/tree/qhull http://github.com/pv/scipy-work/blob/qhull/scipy/interpolate/qhull.pyx Sample (4-D): ------------------------------------------------------------- import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import griddata def f(x, y, z, u): return np.cos(x + np.cos(y + np.cos(z + np.cos(u)))) points = np.random.rand(2000, 4) values = f(*points.T) p = np.linspace(0, 1, 1500) z = griddata(points, values, np.c_[p, p, p, p], method='linear') plt.plot(p, z, '-', p, f(p, p, p, p), '-') plt.show() ------------------------------------------------------------- It performs the N-D Delaunay tesselation with Qhull, and after that does linear barycentric interpolation on the simplex containing each point. (The nearest-neighbor "interpolation" is, on the other hand, implemented using scipy.spatial.KDTree.) I ended up writing some custom simplex-walking code for locating the simplex containing the points -- this is much faster than a brute-force search. However, it is slightly different from what `qh_findbestfacet` does. [If you're a computational geometry expert, see below...] Speed-wise, Qhull appears to lose in 2-D to `scikits.delaunay` by a factor of about 10 in triangulation in my tests; this, however, includes the time taken by computing the barycentric coordinate transforms. Interpolation, however, seems to be faster. *** I think this would be a nice addition to `scipy.optimize`. I'd like to get it in for 0.9. Later on, we can specialize the `griddata` function to perform better in low dimensions (ndim=2, 3) and add more interpolation methods there; for example natural neighbors, interpolating splines, etc. In ndim > 3, the `griddata` is however already feature-equivalent to Octave and MATLAB. Comments? Pauli PS. A question for computational geometry experts: `qh_findbestfacet` finds the facet on the lower convex hull whose oriented hyperplane distance to the point lifted on the paraboloid is maximal --- however, this does not always give the simplex containing the point in the projected Delaunay tesselation. There's a counterexample in qhull.pyx:_find_simplex docstring. On the other hand, Qhull documentation claims that this is the way to locate the Delaunay simplex containing a point. What gives? It's clear, however, that if a simplex contains the point, then the hyperplane distance is positive. So currently, I just walk around positive-distance simplices checking the inside condition for each and hoping for the best. If nothing is found, I just fall back to brute force search. This seems to work well in practice. However, if a point is outside the triangulation (but inside the bounding box), I have to fall back to a brute-force search. Is there a better option here? *** Another sample (2-D, object-oriented interface): ------------------------------------------------------------- import time import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import LinearNDInterpolator # some random data x = np.array([(0,0), (0,1), (1, 1), (1, 0)], dtype=np.double) np.random.seed(0) xr = np.random.rand(200*200, 2).astype(np.double) x = np.r_[xr, x] y = np.arange(x.shape[0], dtype=np.double) # evaluate on a grid nn = 500 xx = np.linspace(-0.1, 1.1, nn) yy = np.linspace(-0.1, 1.1, nn) xx, yy = np.broadcast_arrays(xx[:,None], yy[None,:]) xi = np.array([xx, yy]).transpose(1,2,0).copy() start = time.time() ip = LinearNDInterpolator(x, y) print "Triangulation", time.time() - start start = time.time() zi = ip(xi) print "Interpolation", time.time() - start from scikits.delaunay import Triangulation start = time.time() tri2 = Triangulation(x[:,0], x[:,1]) print "scikits.delaunay triangulation", time.time() - start start = time.time() ip2 = tri2.linear_interpolator(y) zi2 = ip2[-0.1:1.1:500j,-0.1:1.1:500j].T print "scikits.delaunay interpolation", time.time() - start print "rel-difference", np.nanmax(abs(zi - zi2))/np.nanmax(np.abs(zi)) plt.figure() plt.imshow(zi) plt.clim(y.min(), y.max()) plt.colorbar() plt.figure() plt.imshow(zi2) plt.clim(y.min(), y.max()) plt.colorbar() plt.show() ------------------------------------------------------------- -- Pauli Virtanen
On Sun, Jul 25, 2010 at 9:35 AM, Pauli Virtanen <pav@iki.fi> wrote:
Hi all,
I took the Qhull by the horns, and wrote a straightforward `griddata` implementation for working in N-D:
http://github.com/pv/scipy-work/tree/qhull http://github.com/pv/scipy-work/blob/qhull/scipy/interpolate/qhull.pyx
Sample (4-D): ------------------------------------------------------------- import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import griddata
def f(x, y, z, u): return np.cos(x + np.cos(y + np.cos(z + np.cos(u))))
points = np.random.rand(2000, 4) values = f(*points.T)
p = np.linspace(0, 1, 1500) z = griddata(points, values, np.c_[p, p, p, p], method='linear')
plt.plot(p, z, '-', p, f(p, p, p, p), '-') plt.show() -------------------------------------------------------------
It performs the N-D Delaunay tesselation with Qhull, and after that does linear barycentric interpolation on the simplex containing each point. (The nearest-neighbor "interpolation" is, on the other hand, implemented using scipy.spatial.KDTree.)
I ended up writing some custom simplex-walking code for locating the simplex containing the points -- this is much faster than a brute-force search. However, it is slightly different from what `qh_findbestfacet` does. [If you're a computational geometry expert, see below...]
Speed-wise, Qhull appears to lose in 2-D to `scikits.delaunay` by a factor of about 10 in triangulation in my tests; this, however, includes the time taken by computing the barycentric coordinate transforms. Interpolation, however, seems to be faster.
***
I think this would be a nice addition to `scipy.optimize`. I'd like to get it in for 0.9.
Do you mean scipy.interpolate?
Later on, we can specialize the `griddata` function to perform better in low dimensions (ndim=2, 3) and add more interpolation methods there; for example natural neighbors, interpolating splines, etc. In ndim > 3, the `griddata` is however already feature-equivalent to Octave and MATLAB.
Comments?
I'm not a computational geometry expert, but I expect the 2-D performance or scikits.delaunay takes advantage of a special algorithm for that case (I recall googling for that back when). Perhaps it can be adapted to the higher dimension case or adapted for searching. Just random thoughts here, I have no actual experience with these sorts of problems. I certainly support getting added interpolation functionality into scipy and barycentric interpolation has the advantage of not introducing weird artifacts. I like it for drawing contours also. Chuck
Sun, 25 Jul 2010 10:45:23 -0600, Charles R Harris wrote:
Do you mean scipy.interpolate?
Yep. [clip]
I'm not a computational geometry expert, but I expect the 2-D performance or scikits.delaunay takes advantage of a special algorithm for that case (I recall googling for that back when). Perhaps it can be adapted to the higher dimension case or adapted for searching. Just random thoughts here, I have no actual experience with these sorts of problems.
Optimizing the time spent in Qhull is probably very difficult, so I think we'll just have to live with that part. As far as I understand, the search algorithm in scikits.delaunay walks the triangles by hopping to a neighbour chosen so that the target point is on positive side of the corresponding ridge. That generalizes to N-D easily. Randomizing the interpolation points causes a 20x hit on the current search algorithm, whereas scikits.delaunay gets away with a 4x performance drop with the same input, so there could be some room for improvement here. -- Pauli Virtanen
Sun, 25 Jul 2010 21:52:24 +0000, Pauli Virtanen wrote: [clip]
As far as I understand, the search algorithm in scikits.delaunay walks the triangles by hopping to a neighbour chosen so that the target point is on positive side of the corresponding ridge. That generalizes to N-D easily.
This turned out to work quite well. A hybrid algorithm combining the two (first search a simplex with a positive plane distance, then continue with directed search) seems to work even better. So now it should be quite robust, and I'm happy with the timings, too: -- qhull triangulate 2.19968700409 -- qhull interpolate (meshgrid) 0.651250123978 -- qhull interpolate (random order) 22.201515913 -- scikits.delaunay triangulate 0.925380945206 -- scikits.delaunay linear_interpolate (meshgrid) 4.93817210197 -- scikits.delaunay nn_interpolate (meshgrid) 7.32182908058 -- scikits.delaunay nn_interpolate (random order) 45.4793360233 -- abs-diff-max 8.91013769433e-08 -- rel-diff-max 7.27796350818e-12 Qhull is roughly 2-3x slower in forming the triangulation in 2-D than scikits.delaunay, but that's acceptable. For some reason, the new interpolation code is faster than the linear interpolation in scikits.delaunay. I'll probably try to split the code so that the Qhull geometry parts end up in scipy.spatial, and the interpolation routines in scipy.interpolate. I'll still need to test the Qhull triangulation against some pathological cases.. Pauli --------------------------------------------------------------------- import qhull import numpy as np import sys import time np.random.seed(1234) x = np.random.rand(240*240, 2).astype(np.double) y = np.arange(x.shape[0], dtype=np.double) nn = 500 xx = np.linspace(-0.1, 1.1, nn) yy = np.linspace(-0.1, 1.1, nn) xx, yy = np.broadcast_arrays(xx[:,None], yy[None,:]) xi = np.array([xx, yy]).transpose(1,2,0).copy() # permuted order xix = xi.reshape(nn*nn, 2) p = np.random.permutation(nn*nn) p2 = p.copy() p2[p] = np.arange(nn*nn) xix = xix[p] # process! print "-- qhull triangulate" start = time.time() ip = qhull.LinearNDInterpolator(x, y) print time.time() - start print "-- qhull interpolate (meshgrid)" start = time.time() zi = ip(xi) print time.time() - start print "-- qhull interpolate (random order)" start = time.time() zi = ip(xix)[p2].reshape(zi.shape) print time.time() - start from scikits.delaunay import Triangulation print "-- scikits.delaunay triangulate" start = time.time() tri2 = Triangulation(x[:,0], x[:,1]) print time.time() - start print "-- scikits.delaunay linear_interpolate (meshgrid)" start = time.time() ip2 = tri2.linear_interpolator(y) zi2 = ip2[-0.1:1.1:(nn*1j),-0.1:1.1:(nn*1j)].T print time.time() - start print "-- scikits.delaunay nn_interpolate (meshgrid)" start = time.time() ip3 = tri2.nn_interpolator(y) zi3 = ip3(xi[:,:,0].ravel(), xi[:,:,1].ravel()).reshape(zi.shape) print time.time() - start print "-- scikits.delaunay nn_interpolate (random order)" start = time.time() zi3 = ip3(xix[:,0], xix[:,1])[p2].reshape(zi.shape) print time.time() - start print "-- abs-diff-max", np.nanmax(abs(zi-zi2)) print "-- rel-diff-max", np.nanmax(abs(zi-zi2)/abs(zi))
Fantastic. Excellent work. This is something we have been missing for a long time. Travis -- (mobile phone of) Travis Oliphant Enthought, Inc. 1-512-536-1057 http://www.enthought.com On Jul 25, 2010, at 10:35 AM, Pauli Virtanen <pav@iki.fi> wrote:
Hi all,
I took the Qhull by the horns, and wrote a straightforward `griddata` implementation for working in N-D:
http://github.com/pv/scipy-work/tree/qhull http://github.com/pv/scipy-work/blob/qhull/scipy/interpolate/qhull.pyx
Sample (4-D): ------------------------------------------------------------- import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import griddata
def f(x, y, z, u): return np.cos(x + np.cos(y + np.cos(z + np.cos(u))))
points = np.random.rand(2000, 4) values = f(*points.T)
p = np.linspace(0, 1, 1500) z = griddata(points, values, np.c_[p, p, p, p], method='linear')
plt.plot(p, z, '-', p, f(p, p, p, p), '-') plt.show() -------------------------------------------------------------
It performs the N-D Delaunay tesselation with Qhull, and after that does linear barycentric interpolation on the simplex containing each point. (The nearest-neighbor "interpolation" is, on the other hand, implemented using scipy.spatial.KDTree.)
I ended up writing some custom simplex-walking code for locating the simplex containing the points -- this is much faster than a brute-force search. However, it is slightly different from what `qh_findbestfacet` does. [If you're a computational geometry expert, see below...]
Speed-wise, Qhull appears to lose in 2-D to `scikits.delaunay` by a factor of about 10 in triangulation in my tests; this, however, includes the time taken by computing the barycentric coordinate transforms. Interpolation, however, seems to be faster.
***
I think this would be a nice addition to `scipy.optimize`. I'd like to get it in for 0.9.
Later on, we can specialize the `griddata` function to perform better in low dimensions (ndim=2, 3) and add more interpolation methods there; for example natural neighbors, interpolating splines, etc. In ndim > 3, the `griddata` is however already feature-equivalent to Octave and MATLAB.
Comments?
Pauli
PS. A question for computational geometry experts:
`qh_findbestfacet` finds the facet on the lower convex hull whose oriented hyperplane distance to the point lifted on the paraboloid is maximal --- however, this does not always give the simplex containing the point in the projected Delaunay tesselation. There's a counterexample in qhull.pyx:_find_simplex docstring.
On the other hand, Qhull documentation claims that this is the way to locate the Delaunay simplex containing a point. What gives?
It's clear, however, that if a simplex contains the point, then the hyperplane distance is positive. So currently, I just walk around positive-distance simplices checking the inside condition for each and hoping for the best. If nothing is found, I just fall back to brute force search.
This seems to work well in practice. However, if a point is outside the triangulation (but inside the bounding box), I have to fall back to a brute-force search. Is there a better option here?
***
Another sample (2-D, object-oriented interface): ------------------------------------------------------------- import time import numpy as np import matplotlib.pyplot as plt from scipy.interpolate import LinearNDInterpolator
# some random data x = np.array([(0,0), (0,1), (1, 1), (1, 0)], dtype=np.double) np.random.seed(0) xr = np.random.rand(200*200, 2).astype(np.double) x = np.r_[xr, x] y = np.arange(x.shape[0], dtype=np.double)
# evaluate on a grid nn = 500 xx = np.linspace(-0.1, 1.1, nn) yy = np.linspace(-0.1, 1.1, nn) xx, yy = np.broadcast_arrays(xx[:,None], yy[None,:])
xi = np.array([xx, yy]).transpose(1,2,0).copy() start = time.time() ip = LinearNDInterpolator(x, y) print "Triangulation", time.time() - start start = time.time() zi = ip(xi) print "Interpolation", time.time() - start
from scikits.delaunay import Triangulation
start = time.time() tri2 = Triangulation(x[:,0], x[:,1]) print "scikits.delaunay triangulation", time.time() - start
start = time.time() ip2 = tri2.linear_interpolator(y) zi2 = ip2[-0.1:1.1:500j,-0.1:1.1:500j].T print "scikits.delaunay interpolation", time.time() - start
print "rel-difference", np.nanmax(abs(zi - zi2))/np.nanmax(np.abs(zi))
plt.figure() plt.imshow(zi) plt.clim(y.min(), y.max()) plt.colorbar() plt.figure() plt.imshow(zi2) plt.clim(y.min(), y.max()) plt.colorbar() plt.show() -------------------------------------------------------------
-- Pauli Virtanen
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
Sun, 25 Jul 2010 15:35:11 +0000, Pauli Virtanen wrote:
I took the Qhull by the horns, and wrote a straightforward `griddata` implementation for working in N-D: [clip]
It's now committed to SVN (as it works-well-for-me). Post-mortem review and testing is welcome. http://projects.scipy.org/scipy/changeset/6653 http://projects.scipy.org/scipy/changeset/6655 http://projects.scipy.org/scipy/changeset/6657 What's in there is: 1) scipy.spatial.qhull Delaunay decomposition and some associated low-level N-d geometry routines. 2) scipy.interpolate.interpnd N-dimensional interpolation: 1) Linear barycentric interpolation 2) Cubic spline interpolation (2D-only, C1 continuous, approximately minimum-curvature). 3) scipy.interpolate.griddatand Convenience interface to the N-d interpolation classes. What could be added: - More comprehensive interface to other features of Qhull - Using qhull_restore, qhull_save to store Qhull contexts instead of copying the relevant data? - Optimizing the cubic interpolant - Monotonic cubic interpolation - Cubic interpolation in 3-d - Natural neighbour interpolation - etc. *** Example: import numpy as np def func(x, y): return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2 grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j] points = np.random.rand(1000, 2) values = func(points[:,0], points[:,1]) from scipy.interpolate import griddata grid_z0 = griddata(points, values, (grid_x, grid_y), method='nearest') grid_z1 = griddata(points, values, (grid_x, grid_y), method='linear') grid_z2 = griddata(points, values, (grid_x, grid_y), method='cubic') import matplotlib.pyplot as plt plt.subplot(221) plt.imshow(func(grid_x, grid_y).T, extent=(0,1,0,1), origin='lower') plt.plot(points[:,0], points[:,1], 'k.', ms=1) plt.title('Original') plt.subplot(222) plt.imshow(grid_z0.T, extent=(0,1,0,1), origin='lower') plt.title('Nearest') plt.subplot(223) plt.imshow(grid_z1.T, extent=(0,1,0,1), origin='lower') plt.title('Linear') plt.subplot(224) plt.imshow(grid_z2.T, extent=(0,1,0,1), origin='lower') plt.title('Cubic') plt.gcf().set_size_inches(6, 6) plt.show() -- Pauli Virtanen
On 2010-08-31, at 6:01 PM, Pauli Virtanen wrote:
What's in there is:
1) scipy.spatial.qhull
Delaunay decomposition and some associated low-level N-d geometry routines.
2) scipy.interpolate.interpnd
N-dimensional interpolation:
1) Linear barycentric interpolation 2) Cubic spline interpolation (2D-only, C1 continuous, approximately minimum-curvature).
3) scipy.interpolate.griddatand
Convenience interface to the N-d interpolation classes.
I don't know if and when I'll have occasion to use this stuff, but I'm glad it's there. Nice work, Pauli! One comment: the name "griddatand" looks odd to my eyes, my mind wants to group it as "datand". I only mention this because it might slip past someone who's looking for "griddata" (also, does np.lookfor match partial words?). I don't really have a suggestion of another name though. "ndgriddata"? Then griddata looks a bit more separate, and it'd match scipy.ndimage. David
Tue, 31 Aug 2010 18:24:48 -0400, David Warde-Farley wrote: [clip]
One comment: the name "griddatand" looks odd to my eyes, my mind wants to group it as "datand". I only mention this because it might slip past someone who's looking for "griddata"
"griddatand" is the name of the module, and probably nobody will need to use it since the stuff is imported to top-level scipy.interpolate. (I can't use "griddata" there since it would shadow the function name.)
(also, does np.lookfor match partial words?).
Yep.
I don't really have a suggestion of another name though. "ndgriddata"? Then griddata looks a bit more separate, and it'd match scipy.ndimage.
That might be a slightly better name, yes. -- Pauli Virtanen
On Wed, Sep 1, 2010 at 12:28 AM, Pauli Virtanen <pav@iki.fi> wrote:
I don't really have a suggestion of another name though. "ndgriddata"? Then griddata looks a bit more separate, and it'd match scipy.ndimage.
That might be a slightly better name, yes.
I saw another checkin today on "grid-datand"; I support David's idea to rename the module to save future confusion. Aside from nitpicking about the name, thanks very much for implementing this! We've had so many questions on the mailing list on this topic, and it's great to have a clean, built-in solution to offer. Regards Stéfan
participants (5)
-
Charles R Harris -
David Warde-Farley -
Pauli Virtanen -
Stéfan van der Walt -
Travis Oliphant