Fitting - Gauss-normal distribution
Hi, I am a SciPy newbie solving this problem: I would like to fit data with gaussian normal distribution.... First, I generated data: list=normal(0.00714,0.0005,140) Then I plot this data: pylab.hist(list,20) And at the end, I'd like to plot a gauss fit in the graph, also.... Could anyone help me, please? Marian This is my code: from numpy import * from RandomArray import * import pylab as p list=normal(0.00714,0.0005,140) p.hist(list,20) p.show()
Hi, you are welcome to use this code - not tested, since it is only rougly translated from a more complex function of mine: from scipy import std from scipy.optimize import leastsq def fit_gaussian(x_data, y_data): # ?_data should be numpy arrays # estimate the expectation value expect = y_data[argmax(x_data)] # find +/- 10 elements around the peak subxdata = x_data[expect-10:expect+11] subydata = y_data[expect-10:expect+11] #estimate the std sigma = std([inpt for inpt subydata in if inpt > 100.0])/len(subydata)**2 #really dirty hack!! #estimate the maximum maximum = max(y_data) #define starting paramters (as 'first guess') parameters0 = [sigma, expect, maximum] def __residuals(params, value, inpt): """ calculates the resdiuals """ sigma, expect, maximum = params err = value - (maximum * exp((-((inpt-expect)/sigma)**2)/2)) #the equation above allows for adding a constant return err def __peval(inpt, params): """ evaluates the function """ sigma, expect, maximum = params return (maximum * exp((-((inpt-expect)/sigma)**2)/2)) #calculate fit paramters plsq = leastsq(__residuals, parameters0, args=(subintensity, subchannel)) #calculate 'full width half maximum' parameter for a gaussian fit fwhm = 2*sqrt(2*log(2))*plsq[0][0] return plsq[0], fwhm, subxdata , subydata, __peval(subchannel, plsq[0]) The code above is not really neat, but allows for a peak shifted along your 'y-axis'. The return value is a tuple of (simga,mu,max),FWHMsubarray of x data, subarray of ydata, and the fitted function as an array. See http://www.scipy.org/Wiki/Documentation?action=AttachFile&do=get&target=scipy_tutorial.pdf for more information - the description of leastsq is really good! HTH Christian On Monday 26 February 2007 14:34, Marian Jakubik wrote:
Hi, I am a SciPy newbie solving this problem:
I would like to fit data with gaussian normal distribution.... First, I generated data:
list=normal(0.00714,0.0005,140)
Then I plot this data:
pylab.hist(list,20)
And at the end, I'd like to plot a gauss fit in the graph, also....
Could anyone help me, please?
Marian
This is my code:
from numpy import * from RandomArray import * import pylab as p
list=normal(0.00714,0.0005,140)
p.hist(list,20) p.show()
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Hi Marian, You could fit the normal using the method of moments: mu = list.mean() sigma = list.std() x = linspace(list.min(), list.max(), 100) pdf = scipy.stats.norm.pdf(x, mu, sigma) s = subplot(111) s.plot(x,y) s.hist(list, 20, normed=True) David 2007/2/26, Christian Meesters <meesters@uni-mainz.de>:
Hi,
you are welcome to use this code - not tested, since it is only rougly translated from a more complex function of mine:
from scipy import std from scipy.optimize import leastsq
def fit_gaussian(x_data, y_data): # ?_data should be numpy arrays # estimate the expectation value expect = y_data[argmax(x_data)] # find +/- 10 elements around the peak subxdata = x_data[expect-10:expect+11] subydata = y_data[expect-10:expect+11] #estimate the std sigma = std([inpt for inpt subydata in if inpt > 100.0 ])/len(subydata)**2 #really dirty hack!! #estimate the maximum maximum = max(y_data) #define starting paramters (as 'first guess') parameters0 = [sigma, expect, maximum]
def __residuals(params, value, inpt): """ calculates the resdiuals """ sigma, expect, maximum = params err = value - (maximum * exp((-((inpt-expect)/sigma)**2)/2)) #the equation above allows for adding a constant return err
def __peval(inpt, params): """ evaluates the function """ sigma, expect, maximum = params return (maximum * exp((-((inpt-expect)/sigma)**2)/2))
#calculate fit paramters plsq = leastsq(__residuals, parameters0, args=(subintensity, subchannel)) #calculate 'full width half maximum' parameter for a gaussian fit fwhm = 2*sqrt(2*log(2))*plsq[0][0] return plsq[0], fwhm, subxdata , subydata, __peval(subchannel, plsq[0])
The code above is not really neat, but allows for a peak shifted along your 'y-axis'. The return value is a tuple of (simga,mu,max),FWHMsubarray of x data, subarray of ydata, and the fitted function as an array.
See
http://www.scipy.org/Wiki/Documentation?action=AttachFile&do=get&target=scipy_tutorial.pdf for more information - the description of leastsq is really good!
HTH Christian
On Monday 26 February 2007 14:34, Marian Jakubik wrote:
Hi, I am a SciPy newbie solving this problem:
I would like to fit data with gaussian normal distribution.... First, I generated data:
list=normal(0.00714,0.0005,140)
Then I plot this data:
pylab.hist(list,20)
And at the end, I'd like to plot a gauss fit in the graph, also....
Could anyone help me, please?
Marian
This is my code:
from numpy import * from RandomArray import * import pylab as p
list=normal(0.00714,0.0005,140)
p.hist(list,20) p.show()
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
Hi, So, it should be noted that the method of moments estimators for a Gaussian distribution are also the maximum likelihood estimators, i.e. the ones that maximize p(data|parameters), as well as the best least square estimator (since taking the log of the density function gives you scaled squared distance from the mean). So optimizing iteratively is hardly necessary in this case. i.e. what David Huard wrote is probably what you're looking for, and the best you're going to be able to do if your goal is just to fit a Gaussian to the data. David On Mon, 2007-02-26 at 10:04 -0500, David Huard wrote:
Hi Marian,
You could fit the normal using the method of moments:
mu = list.mean() sigma = list.std()
x = linspace(list.min(), list.max(), 100) pdf = scipy.stats.norm.pdf(x, mu, sigma) s = subplot(111) s.plot(x,y) s.hist(list, 20, normed=True)
David
2007/2/26, Christian Meesters <meesters@uni-mainz.de>: Hi,
you are welcome to use this code - not tested, since it is only rougly translated from a more complex function of mine:
from scipy import std from scipy.optimize import leastsq
def fit_gaussian(x_data, y_data): # ?_data should be numpy arrays # estimate the expectation value expect = y_data[argmax(x_data)] # find +/- 10 elements around the peak subxdata = x_data[expect-10:expect+11] subydata = y_data[expect-10:expect+11] #estimate the std sigma = std([inpt for inpt subydata in if inpt > 100.0])/len(subydata)**2 #really dirty hack!! #estimate the maximum maximum = max(y_data) #define starting paramters (as 'first guess') parameters0 = [sigma, expect, maximum]
def __residuals(params, value, inpt): """ calculates the resdiuals """ sigma, expect, maximum = params err = value - (maximum * exp((-((inpt-expect)/sigma)**2)/2)) #the equation above allows for adding a constant return err
def __peval(inpt, params): """ evaluates the function """ sigma, expect, maximum = params return (maximum * exp((-((inpt-expect)/sigma)**2)/2))
#calculate fit paramters plsq = leastsq(__residuals, parameters0, args=(subintensity, subchannel)) #calculate 'full width half maximum' parameter for a gaussian fit fwhm = 2*sqrt(2*log(2))*plsq[0][0] return plsq[0], fwhm, subxdata , subydata, __peval(subchannel, plsq[0])
The code above is not really neat, but allows for a peak shifted along your 'y-axis'. The return value is a tuple of (simga,mu,max),FWHMsubarray of x data, subarray of ydata, and the fitted function as an array.
See http://www.scipy.org/Wiki/Documentation?action=AttachFile&do=get&target=scipy_tutorial.pdf for more information - the description of leastsq is really good!
HTH Christian
On Monday 26 February 2007 14:34, Marian Jakubik wrote: > Hi, I am a SciPy newbie solving this problem: > > I would like to fit data with gaussian normal distribution.... First, I > generated data: > > list=normal(0.00714,0.0005,140) > > Then I plot this data: > > pylab.hist (list,20) > > And at the end, I'd like to plot a gauss fit in the graph, also.... > > Could anyone help me, please? > > Marian > > This is my code: > > from numpy import * > from RandomArray import * > import pylab as p > > list=normal(0.00714,0.0005,140) > > p.hist(list,20) > p.show() > > > > _______________________________________________ > SciPy-user mailing list > SciPy-user@scipy.org > http://projects.scipy.org/mailman/listinfo/scipy-user _______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
_______________________________________________ SciPy-user mailing list SciPy-user@scipy.org http://projects.scipy.org/mailman/listinfo/scipy-user
On 26/02/07, David Warde-Farley <david.warde.farley@utoronto.ca> wrote:
So, it should be noted that the method of moments estimators for a Gaussian distribution are also the maximum likelihood estimators, i.e. the ones that maximize p(data|parameters), as well as the best least square estimator (since taking the log of the density function gives you scaled squared distance from the mean). So optimizing iteratively is hardly necessary in this case.
Indeed, fitting a Gaussian is pretty easy. If you want to fit something more sphisticated (even just two Gaussians, for a bimodal distribution), the way to go is probably not to constuct a histogram first. A good approach is to fit for a maximum-likelihood estimate. That is, if you have a pdf f(p1, p2, ..., pn, x) that has n parameters and gives the probability (density) for x given all those parameters, set up a nonlinear optimization for the product f(p1, ..., pn, x1)*...*f(p1, ..., xm). You are likely to have better numerical behaviour if you instead minimize the negative logarithm of this. Anne M. Archibald
Anne Archibald wrote:
On 26/02/07, David Warde-Farley <david.warde.farley@utoronto.ca> wrote:
So, it should be noted that the method of moments estimators for a Gaussian distribution are also the maximum likelihood estimators, i.e. the ones that maximize p(data|parameters), as well as the best least square estimator (since taking the log of the density function gives you scaled squared distance from the mean). So optimizing iteratively is hardly necessary in this case.
Indeed, fitting a Gaussian is pretty easy. If you want to fit something more sphisticated (even just two Gaussians, for a bimodal distribution), the way to go is probably not to constuct a histogram first. A good approach is to fit for a maximum-likelihood estimate. That is, if you have a pdf f(p1, p2, ..., pn, x) that has n parameters and gives the probability (density) for x given all those parameters, set up a nonlinear optimization for the product f(p1, ..., pn, x1)*...*f(p1, ..., xm).
Note that one standard iterative algorithm to fit a mixture of Gaussian using maximum likelihood method (Expectation Maximization) is implemented in the scipy.sandbox.pyem. David
participants (6)
-
Anne Archibald -
Christian Meesters -
David Cournapeau -
David Huard -
David Warde-Farley -
Marian Jakubik