[Tutor] constructing semi-arbitrary functions

Tue Feb 18 16:09:41 CET 2014

On 18 February 2014 00:51, "André Walker-Loud <walksloud at gmail.com>"
<walksloud at gmail.com> wrote:
>>
>> BTW if you're trying to fit the coefficients of a polynomial then a
>> general purpose optimisation function is probably not what you want to
>> use. I would probably solve (in a least squares sense and after
>> suitable scaling) the Vandermonde matrix.
>>
>> (I can explain that more if desired.)
>
> That looks interesting (just reading the wikipedia entry on the Vandermonde matrix).
> Is there a good algorithm for picking the values the polynomial is evaluated at to construct the matrix?

If I understand correctly your data points are given and it is the
polynomial coefficients you are trying to solve for. In that case you
would use your data points. So if I have data points (x1, y1), ...
(x4, y4) and I want to fit a quadratic polynomial of the form y = f(x)
then I can choose to minimise the squared error in estimating yi from
xi and the coefficients. That is:

a * x1^2 + b * x1 + c = y1e
...
a * x4^2 + b * x4 + c = y4e

And you want to choose a, b and c to minimise the total squared error
E = |y1 - y1e|^2 + ... + |y4 - y4e|^2. We can write this as a matrix
equation that you want to solve in a least squares sense:

|x1^2   x1   1 |             | y1 |
|x2^2   x2   1 |  | a |      | y2 |
|x3^2   x3   1 |  | b |   =  | y3 |
|x4^2   x4   1 |  | c |      | y4 |

Or in other words  M(x) k = y where M(x) is the Vandermonde matrix of
the x coordinates, y is a column vector of the y coordinates (in the
same order) and k is a column vector of the coefficients. You want the
least squares solution of this linear system of equations which is the
exact solution of:

M' M c = M' y

where M' means the transpose of M. I mentioned scaling as the
Vandermonde matrix can be badly scaled if the x values are either much
smaller than or much bigger than which leads to numerical instability.
If you rescale your x values to between zero and one before doing this
you may get a more accurate result.

> Is there a benefit to this method vs a standard linear least squares?

It's the same except that you're using an analytic solution rather
than a black box solver.

> Given the Vandermonde matrix, how do you determine the uncertainty in the resulting parameters?

Assuming that you have confidence in the accuracy of your input data:
http://en.wikipedia.org/wiki/Linear_least_squares_(mathematics)#Parameter_confidence_limits

> I guess yes, I am interested to learn more.
>
> The most common problem I am solving is fitting a sum of real exponentials to noisy data, with the model function
>
> C(t) = sum_n A_n e^{- E_n t}
>
> the quantities of most interest are E_n, followed by A_n so I solve this with non-linear regression.
> To stabilize the fit, I usually do a linear-least squares for the A_n first, solving as a function of the E_n, and then do a non-linear fit for the E_n.

That doesn't sound optimal to me. Maybe I've misunderstood but fitting
over one variable and then over another is not in any way guaranteed
to produce an optimal fit.

Let's say I want to choose x* and y* to minimise f(x, y). If I choose
y0 as a guess and then find x1 that minimises f(x, y0) and then I use
that x1 and find the y that minimises f(x1, y) then I have my guess x1
and y1. These are unlikely to be close to the true optimal choices x*
and y*. If you continue the algorithm choosing x2 to minimise f(x, y1)
and y2 to minimise f(x2, y) generating x2, y2, x3, y3 and so on then
it will eventually converge (if the problem is convex).

In the xy plane this algorithm would be zig-zagging horizontally and
vertically down a valley towards the optimal solution. You can see a
picture of this on page 414 of this (free online) book:
http://apps.nrbook.com/c/index.html

> Often, we construct multiple data sets which share values of E_n but have different A_n, where A_n is promoted to a matrix, sometimes square, sometimes not.  So I want a wrapper on my chisq which can take the input parameter values and deduce the fit function, and pass the variable names to my current minimizer - hence my original question, and why I don't want to write a special case for each possible combination of n_max, and the shape of A_n.

Well it sounds like your approach so far is working for now but as I
say the real fix is to improve or bypass the interface you're using.
One limitation that you may at some point hit is that in Python you
can't have an unbounded number of formal parameters for a function:

$ python3 tmp2.py
  File "tmp2.py", line 1
    def f(x0,  x1,  x2,  x3,  x4,  x5,  x6,  x7,  x8,  x9,  x10,  x11,
 x12,  x13,  x14,
         ^
SyntaxError: more than 255 arguments

Oscar