[SciPy-user] Fitting an arbitrary distribution

Thu May 21 22:33:12 EDT 2009

On Thu, May 21, 2009 at 9:58 PM, David Cournapeau
<david at ar.media.kyoto-u.ac.jp> wrote:
> David Baddeley wrote:
>> Hi all,
>>
>> I want to fit an arbitrary distribution (in this case the sum of multiple Gaussians) to some measured data and was wondering if anyone could give me any pointers as to the best way of doing this. I'd like to avoid fitting to a histogram if possible. How do the .fit() methods of the various distributions under scipy.stats do it? My first thought would be to compare the cumulative distribution of my data with that of the model distibution using something like the kolmogorov-smirnov metric (maximum absolute distance between the curves) and to minimize this using optimize.fmin. Is this the right way to do it? Or is there an easier way?
>
> That's a complex topic in general, there is no best answer, it depends
> on your case, and what you intend to do with the estimated distribution.
>
> In the case of a sum of mutiple Gaussians, the more commonly used name
> for this model is mixture models, and there is a vast range of possible
> techniques for fitting a dataset to this model. There is a package in
> scikits.learn to use the so-called Expectation Maximization algorithm to
> estimate the maximum likelihood of such models
>
> http://www.ar.media.kyoto-u.ac.jp/members/david/softwares/em/
>
> You can have an overview on the wiki page:
>
> http://en.wikipedia.org/wiki/Mixture_model
>

Sum of random variables are convolutions, and are very different from
mixtures of distributions. I just got confused in a discussion today
when the other person talked about convolutions and I thought about
mixtures and it didn't make a lot of sense.

so, which is it?

Josef