[SciPy-User] max likelihood
David Goldsmith
d.l.goldsmith at gmail.com
Mon Jun 21 19:04:10 EDT 2010
On Mon, Jun 21, 2010 at 3:51 PM, <josef.pktd at gmail.com> wrote:
> On Mon, Jun 21, 2010 at 6:17 PM, Skipper Seabold <jsseabold at gmail.com>
> wrote:
> > On Mon, Jun 21, 2010 at 5:55 PM, David Goldsmith
> > <d.l.goldsmith at gmail.com> wrote:
> >> On Mon, Jun 21, 2010 at 2:43 PM, Skipper Seabold <jsseabold at gmail.com>
> >> wrote:
> >>>
> >>> On Mon, Jun 21, 2010 at 5:34 PM, David Goldsmith
> >>> <d.l.goldsmith at gmail.com> wrote:
> >>> > On Mon, Jun 21, 2010 at 2:17 PM, eneide.odissea
> >>> > <eneide.odissea at gmail.com>
> >>> > wrote:
> >>> >>
> >>> >> Hi All
> >>> >> I had a look at the scipy.stats documentation and I was not able to
> >>> >> find a
> >>> >> function for
> >>> >> maximum likelihood parameter estimation.
> >>> >> Do you know whether is available in some other namespace/library of
> >>> >> scipy?
> >>> >> I found on the web few libraries ( this one is an
> >>> >> example http://bmnh.org/~pf/p4.html <http://bmnh.org/%7Epf/p4.html> )
> having it,
> >>> >> but I would prefer to start playing with what scipy already offers
> by
> >>> >> default ( if any ).
> >>> >> Kind Regards
> >>> >> eo
> >>> >
> >>> > scipy.stats.distributions.rv_continuous.fit (I was just working on
> the
> >>> > docstring for that; I don't believe my changes have been merged; I
> >>> > believe
> >>> > Travis recently updated its code...)
> >>> >
> >>>
> >>> This is for fitting the parameters of a distribution via maximum
> >>> likelihood given that the DGP is the underlying distribution. I don't
> >>> think it is intended for more complicated likelihood functions (where
> >>> Nelder-Mead might fail). And in any event it will only find the
> >>> parameters of the distribution rather than the parameters of some
> >>> underlying model, if this is what you're after.
> >>>
> >>> Skipper
> >>
> >> OK, but just for clarity in my own mind: are you saying that
> >> rv_continuous.fit is _definitely_ inappropriate/inadequate for OP's
> needs
> >> (i.e., am I _completely_ misunderstanding the relationship between the
> >> function and OP's stated needs), or are you saying that the function
> _may_
> >> not be general/robust enough for OP's stated needs?
> >
> > Well, I guess it depends on exactly what kind of likelihood function
> > is being optimized. That's why I asked.
> >
> > My experience with stats.distributions is all of about a week, so I
> > could be wrong. But here it goes... rv_continuous is not intended to
> > be used on its own but rather as the base class for any distribution.
> > So if you believe that your data came from say an Gaussian
> > distribution, then you could use norm.fit(data) (with other options as
> > needed) to get back estimates of scale and location. So
> >
> > In [31]: from scipy.stats import norm
> >
> > In [32]: import numpy as np
> >
> > In [33]: x = np.random.normal(loc=0,scale=1,size=1000)
> >
> > In [34]: norm.fit(x)
> > Out[34]: (-0.043364692830314848, 1.0205901804210851)
> >
> > Which is close to our given location and scale.
> >
> > But if you had in mind some kind of data generating process for your
> > model based on some other observed data and you were interested in the
> > marginal effects of changes in the observed data on the outcome, then
> > it would be cumbersome I think to use the fit in distributions. It may
> > not be possible. Also, as mentioned, fit only uses Nelder-Mead
> > (optimize.fmin with the default parameters, which I've found to be
> > inadequate for even fairly basic likelihood based models), so it may
> > not be robust enough. At the moment, I can't think of a way to fit a
> > parameterized model as fit is written now. Come to think of it though
> > I don't think it would be much work to extend the fit method to work
> > for something like a linear regression model.
>
> rephrasing this a bit and adding some comments:
>
> the fit of the distributions, estimate the parameters, shapes, loc and
> scale directly, while often we want the distribution parameters,
> especially loc (or mean) to depend on some explanatory variables.
>
> Generalized Linear Models does this for the exponential family of
> distributions.
>
> R has a package where any distribution parameter can be parameterized
> as a (linear) function of some explanatory variables. This would not
> be too difficult to implement, but I'm not sure how well established
> the theory and algorithms is outside of the exponential family and
> some specific distributions. Also in many cases it will not be obvious
> that the likelihood function is well (enough) behaved.
>
> I looked at the case for the t distribution, because I want it for
> GARCH, but even there it is not completely clear whether the
> parameterization of the t distribution should use the standard
> t-distribution or the standardized t-distribution (scale=var=1).
>
> It would be easy to do a quick job, but more time consuming to get it
> to work correctly for many cases/distributions.
>
> Josef
> BTW: I haven't touched fit in stats.distributions in a long time, the
> new version is all Travis'
>
Ooops, as I originally thought, thanks!
DG
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100621/e93cf80d/attachment.html>
More information about the SciPy-User
mailing list