Hi, We have a set of data that we fit to a nonlinear function using scipy.optimize.leastsq that, AFAIK, uses the Levenberg-Marquardt method. Talking with a collegue of another lab, he pointed me that the dataset we fit usually has intrinsically more noise in the first part of the data than the latter. So he fitted by taking into account the non uniform error -that is, instead of using plain chi-square, giving more weight to the distance from points with less intrinsic error. He told me that on Origin there is a function that does it. Is there something similar on scipy? m. -- Massimo Sandal University of Bologna Department of Biochemistry "G.Moruzzi" snail mail: Via Irnerio 48, 40126 Bologna, Italy email: massimo.sandal@unibo.it tel: +39-051-2094388 fax: +39-051-2094387
massimo sandal wrote:
Hi,
We have a set of data that we fit to a nonlinear function using scipy.optimize.leastsq that, AFAIK, uses the Levenberg-Marquardt method.
Talking with a collegue of another lab, he pointed me that the dataset we fit usually has intrinsically more noise in the first part of the data than the latter. So he fitted by taking into account the non uniform error -that is, instead of using plain chi-square, giving more weight to the distance from points with less intrinsic error. He told me that on Origin there is a function that does it. Is there something similar on scipy?
Have a look at scipy.odr. This module does orthogonal distance regression (or just normal least squares if you prefer). Interesting for you is the fact that you can pass an array containing the weights of the data. Even better, odr gives you an error estimation of the fit. Note that in scipy versions <= 0.5.2 odr resides in scipy.sandbox.odr, however there is a bug which prevents importing itm which is fixed in svn. You might want to have a look at peak-o-mat (http://lorentz.sf.net), too. It's a general data fitting application which makes use of scipy.odr. Christian
On 20/06/07, massimo sandal <massimo.sandal@unibo.it> wrote:
Hi,
We have a set of data that we fit to a nonlinear function using scipy.optimize.leastsq that, AFAIK, uses the Levenberg-Marquardt method.
Talking with a collegue of another lab, he pointed me that the dataset we fit usually has intrinsically more noise in the first part of the data than the latter. So he fitted by taking into account the non uniform error -that is, instead of using plain chi-square, giving more weight to the distance from points with less intrinsic error. He told me that on Origin there is a function that does it. Is there something similar on scipy?
The easiest solution is to rescale your y values by the uncertainties before doing the fit. Now, if your errors are not Gaussian, least-squares is no longer the correct approach and your life becomes more difficult... Anne
On 20/06/07, massimo sandal <massimo.sandal@unibo.it> wrote:
Hi,
We have a set of data that we fit to a nonlinear function using scipy.optimize.leastsq that, AFAIK, uses the Levenberg-Marquardt method.
Talking with a collegue of another lab, he pointed me that the dataset we fit usually has intrinsically more noise in the first part of the data than the latter. So he fitted by taking into account the non uniform error -that is, instead of using plain chi-square, giving more weight to the distance from points with less intrinsic error. He told me that on Origin there is a function that does it. Is there something similar on scipy?
The easiest solution is to rescale your y values by the uncertainties before doing the fit. Now, if your errors are not Gaussian, least-squares is no longer the correct approach and your life becomes more difficult... Anne
Sorry, but I am quite a noob in serious data analysis (degree in molecular biology, sigh)...
The easiest solution is to rescale your y values by the uncertainties before doing the fit.
What do you mean by that?
Now, if your errors are not Gaussian, least-squares is no longer the correct approach and your life becomes more difficult...
In which sense not Gaussian? In the sense that for each point, the uncertainity is not Gaussian distributed? It should at least with good approximation be. If it is in another sense, please explain... thanks, m. -- Massimo Sandal University of Bologna Department of Biochemistry "G.Moruzzi" snail mail: Via Irnerio 48, 40126 Bologna, Italy email: massimo.sandal@unibo.it tel: +39-051-2094388 fax: +39-051-2094387
Now, if your errors are not Gaussian, least-squares is no longer the correct approach and your life becomes more difficult...
In which sense not Gaussian? In the sense that for each point, the uncertainity is not Gaussian distributed? It should at least with good approximation be. If it is in another sense, please explain...
If the error is not Gaussian (normally distributed, ...), least squares is not the "most likely" optimization (maximizing likelyhood on gaussian data is the same as least squares), you should use more robust cost functions. Matthieu
Matthieu Brucher ha scritto:
In which sense not Gaussian? In the sense that for each point, the uncertainity is not Gaussian distributed? It should at least with good approximation be. If it is in another sense, please explain...
If the error is not Gaussian (normally distributed, ...), least squares is not the "most likely" optimization (maximizing likelyhood on gaussian data is the same as least squares), you should use more robust cost functions.
I guess you refer to the distribution of the error for *each single point*, not the distribution of the average error in the dataset for different points. In this case yes, it is Gaussian, so there should be no problem. My question was different, anyway: each point can have a different error size (i.e. sigma of gaussian distribution) respect to its neighbours. m. -- Massimo Sandal University of Bologna Department of Biochemistry "G.Moruzzi" snail mail: Via Irnerio 48, 40126 Bologna, Italy email: massimo.sandal@unibo.it tel: +39-051-2094388 fax: +39-051-2094387
I guess you refer to the distribution of the error for *each single point*, not the distribution of the average error in the dataset for different points. In this case yes, it is Gaussian, so there should be no problem.
It is the distribution for all errors. If it is the same distribution for all points, OK with least squares. If it is not, you have to scale the points so that the errors follow the same gaussian law. My question was different, anyway: each point can have a different error
size (i.e. sigma of gaussian distribution) respect to its neighbours.
m.
-- Massimo Sandal University of Bologna Department of Biochemistry "G.Moruzzi"
snail mail: Via Irnerio 48, 40126 Bologna, Italy
email: massimo.sandal@unibo.it
tel: +39-051-2094388 fax: +39-051-2094387
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Matthieu Brucher ha scritto:
I guess you refer to the distribution of the error for *each single point*, not the distribution of the average error in the dataset for different points. In this case yes, it is Gaussian, so there should be no problem.
It is the distribution for all errors. If it is the same distribution for all points, OK with least squares. If it is not, you have to scale the points so that the errors follow the same gaussian law.
Sigh, there is some misunderstanding between us (surely due to my utter ignorance). Imagine I have three data points, A B C. Usually, if the error is uniform (on Y) can be: Ay +/- 5 By +/- 5 Cy +/- 5 In my case it is: Ay +/- 5 By +/- 7 Cy +/- 11 Now, the error in all those three cases behaves gaussianly, but with different widths. 1)Does this mean that least squares is NOT ok? 2)What does "rescaling" mean in this context? m. -- Massimo Sandal University of Bologna Department of Biochemistry "G.Moruzzi" snail mail: Via Irnerio 48, 40126 Bologna, Italy email: massimo.sandal@unibo.it tel: +39-051-2094388 fax: +39-051-2094387
1)Does this mean that least squares is NOT ok?
Yes, LS is _NOT_ OK because it assumes that the distribution (with its parameters) is the same for all errors. I don't remember exactly, but this may be due to ergodicity 2)What does "rescaling" mean in this context? You must change B and C so that : Ay +/- 5 B'y +/- 5 C'y +/- 5 Matthieu
Hi, What you have is an heteroscedastic normal distribution (varying variance) describing the residuals. 2007/6/21, Matthieu Brucher <matthieu.brucher@gmail.com>:
1)Does this mean that least squares is NOT ok?
Yes, LS is _NOT_ OK because it assumes that the distribution (with its parameters) is the same for all errors. I don't remember exactly, but this may be due to ergodicity
Well, let's put things in perspective. You can still use ordinary least-squares. Theoretically, this means you're making the assumption that the error mean and variance are fixed and constant. In your case, this is not true and you can consider the LS solution like an approximation. What will happen under this approximation is that large errors on Cy will tend to dominate the residuals, and values in Ay will probably not be fitted optimally. I advise you try it anyway and visually check whether you care about that or not. 2)What does "rescaling" mean in this context?
You must change B and C so that : Ay +/- 5 B'y +/- 5 C'y +/- 5
Or maximize the likelihood of a multivariate normal distribution, whose covariance matrix describes your assumption about the heteroscedasticity of the residuals. \Sigma = | \sigma_A^2 0 0 | | 0 \sigma_B^2 0 | | 0 0 \sigma_C^2 | Heteroscedastic likelihood = -n/2 \ln(2\pi) - 1/2 \sum \ln(\sigma_i^2) -1/2 \sum \sigma_i^{-2} (y_{obs} - y_{sim})^2 You might also consider the possibility that your errors are multiplicative rather than additive. In this case, describing the residuals by a lognormal distribution could make more sense. Maximize lognormal likelihood: L=lognormal(y_sim | ln(y_obs), \sigma) Cheers, David Matthieu
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
David Huard ha scritto:
Hi,
What you have is an heteroscedastic normal distribution (varying variance) describing the residuals.
2007/6/21, Matthieu Brucher <matthieu.brucher@gmail.com <mailto:matthieu.brucher@gmail.com>>:
1)Does this mean that least squares is NOT ok?
Yes, LS is _NOT_ OK because it assumes that the distribution (with its parameters) is the same for all errors. I don't remember exactly, but this may be due to ergodicity
Well, let's put things in perspective. You can still use ordinary least-squares. Theoretically, this means you're making the assumption that the error mean and variance are fixed and constant. In your case, this is not true and you can consider the LS solution like an approximation. What will happen under this approximation is that large errors on Cy will tend to dominate the residuals, and values in Ay will probably not be fitted optimally. I advise you try it anyway and visually check whether you care about that or not.
Yes, it's what I already do, and works fairly well. I'd like to see how *better* becomes. It can be useful in some contexts, so I wanted to know how to implement it.
Or maximize the likelihood of a multivariate normal distribution, whose covariance matrix describes your assumption about the heteroscedasticity of the residuals.
\Sigma = | \sigma_A^2 0 0 | | 0 \sigma_B^2 0 | | 0 0 \sigma_C^2 |
Heteroscedastic likelihood = -n/2 \ln(2\pi) - 1/2 \sum \ln(\sigma_i^2) -1/2 \sum \sigma_i^{-2} (y_{obs} - y_{sim})^2
You might also consider the possibility that your errors are multiplicative rather than additive. In this case, describing the residuals by a lognormal distribution could make more sense.
Maximize lognormal likelihood: L=lognormal(y_sim | ln(y_obs), \sigma)
I'll try to make sense of it... m. -- Massimo Sandal University of Bologna Department of Biochemistry "G.Moruzzi" snail mail: Via Irnerio 48, 40126 Bologna, Italy email: massimo.sandal@unibo.it tel: +39-051-2094388 fax: +39-051-2094387
Matthieu Brucher ha scritto:
1)Does this mean that least squares is NOT ok?
Yes, LS is _NOT_ OK because it assumes that the distribution (with its parameters) is the same for all errors. I don't remember exactly, but this may be due to ergodicity
OK. I just wanted to be sure I understood.
2)What does "rescaling" mean in this context?
You must change B and C so that : Ay +/- 5 B'y +/- 5 C'y +/- 5
Huh? How can this be possible/make sense whatsoever? m. -- Massimo Sandal University of Bologna Department of Biochemistry "G.Moruzzi" snail mail: Via Irnerio 48, 40126 Bologna, Italy email: massimo.sandal@unibo.it tel: +39-051-2094388 fax: +39-051-2094387
massimo sandal wrote:
Matthieu Brucher ha scritto:
1)Does this mean that least squares is NOT ok?
Yes, LS is _NOT_ OK because it assumes that the distribution (with its parameters) is the same for all errors. I don't remember exactly, but this may be due to ergodicity
OK. I just wanted to be sure I understood.
However, weighted least squares works just fine.
2)What does "rescaling" mean in this context?
You must change B and C so that : Ay +/- 5 B'y +/- 5 C'y +/- 5
Huh? How can this be possible/make sense whatsoever?
I think the notation was misunderstood. Let's start from scratch, at least notationally. You have a function y = f(b, x) where `b` is the parameter vector, `x` is a vector of input points, and `y` is the vector of outputs corresponding to those inputs. Now, you have data consisting of vectors x0 and y0. According to the model, we have random variables Y0[i] which are normally distributed about f(b, x0[i]) each with their own variance v[i]. Equivalently, we can say that the residuals R[i] ~ N(0, v[i]) Now, to solve this problem with leastsq() we need to rescale the *residuals* such that their corresponding random variables all have the same variance. def residuals(b, x0=x0, y0=y0, v=v): return (y0 - f(b, x0)) / sqrt(v) Does this make sense? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On 21/06/07, massimo sandal <massimo.sandal@unibo.it> wrote:
Sorry, but I am quite a noob in serious data analysis (degree in molecular biology, sigh)...
The easiest solution is to rescale your y values by the uncertainties before doing the fit.
What do you mean by that?
If you're doing a linear least-squares fit, you're producing a matrix M and searching for a set of parameters P so that M*P is as close to your measured values Y as possible, in a least-squares sense. (If you're fitting a straight line, for example, P is the two-element vector [m,b], and M is the matrix whose first column is the x values and whose second column is all ones.) In order to adjust the relative importance of the different data points, you can divide a row of M and the corresponding row of Y by the same constant. This will have the effect of changing the relative importance of this row compared to the others. In numpy terminology, if U is the vector of uncertainties on the Y values, you want to replace P = lstsq(A,Y) with P = lstsq((1/U)[:,newaxis]*A,(1/U)*Y) Effectively, this rescales all your measurements so that they have the same uncertainty. Think of it as writing all your measurements in units of one sigma. If you're doing nonlinear least squares, a similar trick is still possible, though of course it is no longer matrix multiplication.
Now, if your errors are not Gaussian, least-squares is no longer the correct approach and your life becomes more difficult...
In which sense not Gaussian? In the sense that for each point, the uncertainity is not Gaussian distributed? It should at least with good approximation be. If it is in another sense, please explain...
No, that's what I meant. There's no need to get into all this in your case, it seems. But in astronomical data, it is very common for the distribution of values to not be Gaussian at all - Poisson errors are probably the most common, though chi-squared and other more exotic distributions crop up too. In these cases, least squares fitting simply gives the wrong answer (though sometimes it's not too wrong, to astronomical accuracy). Anne
Anne Archibald wrote:
On 21/06/07, massimo sandal <massimo.sandal@unibo.it> wrote:
Sorry, but I am quite a noob in serious data analysis (degree in molecular biology, sigh)...
The easiest solution is to rescale your y values by the uncertainties before doing the fit. What do you mean by that?
If you're doing a linear least-squares fit, you're producing a matrix M and searching for a set of parameters P so that M*P is as close to your measured values Y as possible, in a least-squares sense. (If you're fitting a straight line, for example, P is the two-element vector [m,b], and M is the matrix whose first column is the x values and whose second column is all ones.)
In order to adjust the relative importance of the different data points, you can divide a row of M and the corresponding row of Y by the same constant. This will have the effect of changing the relative importance of this row compared to the others. In numpy terminology, if U is the vector of uncertainties on the Y values, you want to replace P = lstsq(A,Y) with P = lstsq((1/U)[:,newaxis]*A,(1/U)*Y)
Effectively, this rescales all your measurements so that they have the same uncertainty. Think of it as writing all your measurements in units of one sigma.
If you're doing nonlinear least squares, a similar trick is still possible, though of course it is no longer matrix multiplication.
Is that the way scipy.odr handles the weights? Christian
Christian K wrote:
Is that the way scipy.odr handles the weights?
The weighted sum of the residuals is the value that is minimized. I like formulating the problem that way rather than going through the details of the linear case. I think it's more direct this way. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
Robert Kern wrote:
Christian K wrote:
Is that the way scipy.odr handles the weights?
The weighted sum of the residuals is the value that is minimized. I like formulating the problem that way rather than going through the details of the linear case. I think it's more direct this way.
I see. So then the same limitation applies: the error has to be gaussian, right? Christian
Christian K wrote:
Robert Kern wrote:
Christian K wrote:
Is that the way scipy.odr handles the weights? The weighted sum of the residuals is the value that is minimized. I like formulating the problem that way rather than going through the details of the linear case. I think it's more direct this way.
I see. So then the same limitation applies: the error has to be gaussian, right?
Yes, of course. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (6)
-
Anne Archibald -
Christian K -
David Huard -
massimo sandal -
Matthieu Brucher -
Robert Kern