![](https://secure.gravatar.com/avatar/ab7e74f2443b81e5175638d72be65e07.jpg?s=120&d=mm&r=g)
On 28/03/2008, Xavier Gnata <xavier.gnata@gmail.com> wrote:
Hum we have a problem here: Polyfit is a linear fit. It can (should?) be implemented using using the pseudo-inverse paradigm and it "pseudo-inverse provide you with the solution that minimize the chi2. As a result, a 12 ordrer fit cannot be worst than a 5 order. It should at least end up with the same chi2.
One trivial exemple : In [9]: polyfit(arange(1000),2*arange(1000)+1,30) C:\Python25\lib\site-packages\numpy\lib\polynomial.py:306: RankWarning: Polyfit may be poorly conditioned warnings.warn(msg, RankWarning)
Out[9]: array([ 4.25295067e-93, -1.90319088e-89, 2.47204987e-86, 3.64986075e-84, -2.04058052e-80, -1.02156478e-77, 1.38333948e-74, 1.84191780e-71, -1.04858370e-69, -1.93210697e-65, -1.30453334e-62, 1.07062138e-59,
2.11173521e-56, 2.00842417e-54, -2.16681833e-50, -1.21709485e-47, 1.99280602e-44, 1.67639356e-41, -2.35550438e-38, -1.10826788e-35, 3.53529389e-32, -2.99768374e-29, 1.44456488e-26, -4.49042306e-24,
9.31940018e-22, -1.28887179e-19, 1.15776067e-17, -6.43505943e-16, 2.04266284e-14, 2.00000000e+00, 1.00000000e+00]) Of course, we also have numerical errors but if they cannot be neglected, then to have found a problem which is really ill-conditioned (hudge range of values ? values very close to inf or 0 ?)...or there is a pb in poyfit but I cannot see this bug.
Any comments?
What exactly is the problem here? This is the best fit that could be hoped for - the linear and constant terms are right, and the rest are zero to the best accuracy we can plausibly expect. If you are using a goodness-of-fit measure that takes into account the number of parameters fitted (Bayesian model comparison, say, or even a chi-squared that takes into account "degrees of freedom") this should be reported as a much worse fit than using only a linear polynomial. The problem is ill-conditioned because 1000**30 is so much bigger than 999**30, as is 1000**29 compared to 999**29, that the vectors arange(1000)**30 and arange(1000)**30 are nearly identical. As the polyfit docstring says, this is a really horrible basis to use for polynomial fitting. Since polyfit is quite sensibly based on the SVD, when it encounters some linear combination of basis vectors that nearly cancels, rather than introduce enormous coefficients to try to use this combination for fitting, it just discards it. This results in slightly poorer fits sometimes, but drastically reduces the numerical headaches that come with ill-conditioned matrices. Thus even if a higher-degree polynomial would fit the data better in a world of exact arithmetic, occasionally numerical issues force the discarding of some troublesome combinations of coefficients. I do think it would be useful to have some mechanism for fitting and working with polynomials represented in other bases, so that these numerical issues would be reduced. This could go through at the same time as an improvement (by specialization) of some of scipy's orthogonal polynomial functions. But for now I can see no real problem with polyfit. Anne