Re: [SciPy-Dev] Splines in Scipy [was: SciPy Goal]

9 Jan 2012

      On Jan 9, 2012, at 10:46 AM, josef.pktd@gmail.com wrote:
...
On Mon, Jan 9, 2012 at 8:02 AM, Zachary Pincus <zachary.pincus@yale.edu> wrote:
...
Also, as long as a list is being made:
scipy.signal has matched functions [cq]spline1d() and [cq]spline1d_eval(), but only [cq]spline2d(), with no matching _eval function.
And as far as FITPACK goes, I agree can be extremely, and possibly dangerously, "quirky" -- it's prone to almost arbitrarily bad ringing artifacts when the smoothing coefficient isn't large enough, and is very (very) sensitive to initial conditions in terms of what will and won't provoke the ringing. It has its uses, but it seems to me odd enough that it really shouldn't be the "default" 1D spline tool to direct people to.
Do you have an example of "arbitrarily" bad ringing?
...
From what I was reading up on splines in the last weeks, I got the
impression was that this is a "feature" of interpolating splines, and
to be useful with a larger number of points we always need to smooth
sufficiently (reduce knots or penalize).
(I just read a comment that R with 5000 points only chooses about 200 knots).
Example below; it's using parametric splines because I have a simple interactive tool to draw them and notice occasional "blowing up" like what you see below. I *think* I've seen similar issues with normal splines, but haven't used them a lot lately. (For the record, treating the x and y values as separate and using the non-parametric spline fitting does NOT yield these crazy errors on *these data*...)

As far as the smoothing parameter, the "good" data will go crazy if s=3, but is fine with s=0.25 or s=4; similarly the "bad" data isn't prone to ringing if s=0.25 or s=5. So there's serious sensitivity both to the x,y positions of the data (as below) and to the smoothing parameter in a fairly small range.

I could probably come up with an example that goes crazy with even fewer input points, but this was the first thing I came up with. Small modifications to the input data seem to make it go even crazier, but the below illustrates the general point.

Zach

import numpy
import scipy.interpolate as interp
good = numpy.array(
      [[ 24.21162868,  28.75056713,  32.64108579,  36.85581434,
         41.07054289,  46.582111  ,  52.417889  ,  55.17367305,
         57.92945711,  61.00945105,  64.89996971,  72.19469221,
         75.76100098,  83.21782842,  83.21782842,  88.56729158,
         86.29782236,  90.18834103,  86.62203225],
       [ 70.57364276,  71.22206254,  69.27680321,  72.5189021 ,
         65.06207466,  70.89785265,  67.33154388,  68.62838343,
         69.92522299,  67.00733399,  77.21994548,  68.30417354,
         71.38416748,  71.38416748,  64.25154993,  70.08732793,
         61.00945105,  63.44102521,  56.47051261]])
bad = good.copy()
# now make a *small* change
bad[:,-1] = 87.432556973542049, 55.984197773255048

good_tck, good_u = interp.splprep(good, s=4)
bad_tck, bad_u = interp.splprep(bad, s=4)
print good.ptp(axis=1)
print numpy.array(interp.splev(numpy.linspace(good_u[0], good_u[-1], 300), good_tck)).ptp(axis=1)
print numpy.array(interp.splev(numpy.linspace(bad_u[0], bad_u[-1], 300), bad_tck)).ptp(axis=1)

And the output on my machine is:
[ 65.97671235  20.74943287]
[ 67.69845281  20.52518913]
[ 2868.98673621  450984.86622631]