[SciPy-User] max likelihood

David Goldsmith d.l.goldsmith at gmail.com
Mon Jun 21 20:41:46 EDT 2010


On Mon, Jun 21, 2010 at 5:19 PM, <josef.pktd at gmail.com> wrote:

> On Mon, Jun 21, 2010 at 8:03 PM, David Goldsmith
> <d.l.goldsmith at gmail.com> wrote:
> > On Mon, Jun 21, 2010 at 4:10 PM, <josef.pktd at gmail.com> wrote:
> >>
> >> On Mon, Jun 21, 2010 at 7:03 PM, David Goldsmith
> >> <d.l.goldsmith at gmail.com> wrote:
> >> > On Mon, Jun 21, 2010 at 3:17 PM, Skipper Seabold <jsseabold at gmail.com
> >
> >> > wrote:
> >> >>
> >> >> On Mon, Jun 21, 2010 at 5:55 PM, David Goldsmith
> >> >> <d.l.goldsmith at gmail.com> wrote:
> >> >> > On Mon, Jun 21, 2010 at 2:43 PM, Skipper Seabold
> >> >> > <jsseabold at gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Mon, Jun 21, 2010 at 5:34 PM, David Goldsmith
> >> >> >> <d.l.goldsmith at gmail.com> wrote:
> >> >> >> > On Mon, Jun 21, 2010 at 2:17 PM, eneide.odissea
> >> >> >> > <eneide.odissea at gmail.com>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> Hi All
> >> >> >> >> I had a look at the scipy.stats documentation and I was not
> able
> >> >> >> >> to
> >> >> >> >> find a
> >> >> >> >> function for
> >> >> >> >> maximum likelihood parameter estimation.
> >> >> >> >> Do you know whether is available in some other
> namespace/library
> >> >> >> >> of
> >> >> >> >> scipy?
> >> >> >> >> I found on the web few libraries ( this one is an
> >> >> >> >> example http://bmnh.org/~pf/p4.html<http://bmnh.org/%7Epf/p4.html> )
> having it,
> >> >> >> >> but I would prefer to start playing with what scipy already
> >> >> >> >> offers
> >> >> >> >> by
> >> >> >> >> default ( if any ).
> >> >> >> >> Kind Regards
> >> >> >> >> eo
> >> >> >> >
> >> >> >> > scipy.stats.distributions.rv_continuous.fit (I was just working
> on
> >> >> >> > the
> >> >> >> > docstring for that; I don't believe my changes have been merged;
> I
> >> >> >> > believe
> >> >> >> > Travis recently updated its code...)
> >> >> >> >
> >> >> >>
> >> >> >> This is for fitting the parameters of a distribution via maximum
> >> >> >> likelihood given that the DGP is the underlying distribution.  I
> >> >> >> don't
> >> >> >> think it is intended for more complicated likelihood functions
> >> >> >> (where
> >> >> >> Nelder-Mead might fail).  And in any event it will only find the
> >> >> >> parameters of the distribution rather than the parameters of some
> >> >> >> underlying model, if this is what you're after.
> >> >> >>
> >> >> >> Skipper
> >> >> >
> >> >> > OK, but just for clarity in my own mind: are you saying that
> >> >> > rv_continuous.fit is _definitely_ inappropriate/inadequate for OP's
> >> >> > needs
> >> >> > (i.e., am I _completely_ misunderstanding the relationship between
> >> >> > the
> >> >> > function and OP's stated needs), or are you saying that the
> function
> >> >> > _may_
> >> >> > not be general/robust enough for OP's stated needs?
> >> >>
> >> >> Well, I guess it depends on exactly what kind of likelihood function
> >> >> is being optimized.  That's why I asked.
> >> >>
> >> >> My experience with stats.distributions is all of about a week, so I
> >> >> could be wrong. But here it goes... rv_continuous is not intended to
> >> >> be used on its own but rather as the base class for any distribution.
> >> >> So if you believe that your data came from say an Gaussian
> >> >> distribution, then you could use norm.fit(data) (with other options
> as
> >> >> needed) to get back estimates of scale and location.  So
> >> >>
> >> >> In [31]: from scipy.stats import norm
> >> >>
> >> >> In [32]: import numpy as np
> >> >>
> >> >> In [33]: x = np.random.normal(loc=0,scale=1,size=1000)
> >> >>
> >> >> In [34]: norm.fit(x)
> >> >> Out[34]: (-0.043364692830314848, 1.0205901804210851)
> >> >>
> >> >> Which is close to our given location and scale.
> >> >>
> >> >> But if you had in mind some kind of data generating process for your
> >> >> model based on some other observed data and you were interested in
> the
> >> >> marginal effects of changes in the observed data on the outcome, then
> >> >> it would be cumbersome I think to use the fit in distributions. It
> may
> >> >> not be possible.   Also, as mentioned, fit only uses Nelder-Mead
> >> >> (optimize.fmin with the default parameters, which I've found to be
> >> >> inadequate for even fairly basic likelihood based models), so it may
> >> >> not be robust enough.  At the moment, I can't think of a way to fit a
> >> >> parameterized model as fit is written now.  Come to think of it
> though
> >> >> I don't think it would be much work to extend the fit method to work
> >> >> for something like a linear regression model.
> >> >>
> >> >> Skipper
> >> >
> >> >
> >> > OK, this is all as I thought (e.g., fit only "works" to get the MLE's
> >> > from
> >> > data for a *presumed* distribution, but it is all-but-useless if the
> >> > distribution isn't (believed to be) "known" a priori); just wanted to
> be
> >> > sure I was reading you correctly. :-)  Thanks!
> >>
> >> MLE always assumes that the distribution is known, since you need the
> >> likelihood function.
> >
> > I'm not sure what I'm missing here (is it the definition of DGP? the
> meaning
> > of Nelder-Mead? I want to learn, off-list if this is considered "noise"):
> > according to my reference - Bain & Englehardt, Intro. to Prob. and Math.
> > Stat., 2nd Ed., Duxbury, 1992 - if the underlying population distribution
> is
> > known, then the likelihood function is well-determined (although the
> > likelihood equation(s) it gives rise to may not be soluble analytically,
> of
> > course).  So why doesn't the OP knowing the underlying distribution (as
> your
> > comment above implies they should if they seek MLEs) imply that s/he
> would
> > also "know" what the likelihood function "looks like," (and thus the
> > question isn't so much what the likelihood function "looks like," but
> what
> > the underlying distribution is, and thence, do we have that distribution
> > implemented yet in scipy.stats)?
>
> DGP: data generating process
>
> In many cases the assumed distribution of the error or noise variable
> is just the normal distribution. But what's the overall model that
> explains the endogenous variable.
> distribution.fit would just assume that each observations is a random
> draw from the same population distribution.
>
> But you can do MLE on standard linear regression, system of equations,
> ARIMA or GARCH in time series analysis. For any of this we need to
> specify what the relationship between the endogenous variable and it's
> own past and other explanatory variables is.
> e.g. simplest ARMA
>
> A(L) y_t = B(L) e_t
> with e_t independently and identically distributed (iid.) normal
> random variable
> A(L), B(L) lag-polynomials
> and for the full MLE we would also need to specify initial conditions.
>
> simple linear regression with non iid errors
> y_t = x_t * beta + e_t      e = {e_t}_{for all t} distributed N(0,
> Sigma)   plus assumptions on the structure of Sigma
>
> in these cases the likelihood function defines a lot more than just
> the distribution of the error term.
>

Ah, jetzt ich verstehe (ich denke).  So in the general case, the procedure
needs to "apportion" the information in the data among the parameters of the
"mechanistic" part of the model and the parameters of the "random noise"
part of the model, and the Maximum Likelihood Equations give you the values
of all these parameters (the mechanistic ones and noise ones) that maximize
the likelihood of observing the data one observed, correct?

DG(NLP?)

>
> short hand: what's the DGP for y_t  for all t ?
>
> Josef
>
> >
> > DG
> >
> >>
> >> It's not non- or semi-parametric.
> >>
> >> Josef
> >>
> >> >
> >> > DG
> >> >
> >> > _______________________________________________
> >> > SciPy-User mailing list
> >> > SciPy-User at scipy.org
> >> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >> >
> >> >
> >> _______________________________________________
> >> SciPy-User mailing list
> >> SciPy-User at scipy.org
> >> http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> >
> > --
> > Mathematician: noun, someone who disavows certainty when their
> uncertainty
> > set is non-empty, even if that set has measure zero.
> >
> > Hope: noun, that delusive spirit which escaped Pandora's jar and, with
> her
> > lies, prevents mankind from committing a general suicide.  (As
> interpreted
> > by Robert Graves)
> >
> > _______________________________________________
> > SciPy-User mailing list
> > SciPy-User at scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
> >
> >
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>



-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.

Hope: noun, that delusive spirit which escaped Pandora's jar and, with her
lies, prevents mankind from committing a general suicide.  (As interpreted
by Robert Graves)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.scipy.org/pipermail/scipy-user/attachments/20100621/def633f7/attachment.html>


More information about the SciPy-User mailing list