[SciPy-user] Fitting an arbitrary distribution

Thu May 21 21:58:06 EDT 2009

David Baddeley wrote:
> Hi all,
>
> I want to fit an arbitrary distribution (in this case the sum of multiple Gaussians) to some measured data and was wondering if anyone could give me any pointers as to the best way of doing this. I'd like to avoid fitting to a histogram if possible. How do the .fit() methods of the various distributions under scipy.stats do it? My first thought would be to compare the cumulative distribution of my data with that of the model distibution using something like the kolmogorov-smirnov metric (maximum absolute distance between the curves) and to minimize this using optimize.fmin. Is this the right way to do it? Or is there an easier way?

That's a complex topic in general, there is no best answer, it depends
on your case, and what you intend to do with the estimated distribution.

In the case of a sum of mutiple Gaussians, the more commonly used name
for this model is mixture models, and there is a vast range of possible
techniques for fitting a dataset to this model. There is a package in
scikits.learn to use the so-called Expectation Maximization algorithm to
estimate the maximum likelihood of such models

http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/

You can have an overview on the wiki page:

http://en.wikipedia.org/wiki/Mixture_model

cheers,

David