[SciPy-Dev] Subversion scipy.stats irregular problem with source code example
josef.pktd at gmail.com
josef.pktd at gmail.com
Thu Sep 30 20:20:46 EDT 2010
On Thu, Sep 30, 2010 at 4:27 PM, James Phillips <zunzun at zunzun.com> wrote:
> On Thu, Sep 30, 2010 at 10:05 AM, <josef.pktd at gmail.com> wrote:
>>
>> Why did you choose to minimize the squared difference of quantiles,
>> instead of the negative log-likelihood in the diffev ?
>
> For my use in estimating initial parameters, the genetic algorithm
> needs a cutoff point at which a sufficient solution has been reached
> and I have useful starting parameters. While I would not know in
> advance what log likelihood value to use as a stopping parameter, I do
> know in advance that a residual sum-of-squares near zero should have
> good initial parameter estimates. That means I can pass a small value
> to the genetic algorithm and have it stop if it finds such parameters,
> there would no need to continue past that point.
ppf is (very) expensive to calculate for some distributions. I tried
something similar, but then switched to matching the cdf instead of
the pdf. The differential evolution seems pretty slow in this case.
What I also did in larger samples is to match only a few quantiles
instead of each observation. (I haven't quite figured out yet how to
get standard errors when matching quantiles this way.)
I tried a few different starting values for your powerlaw example and
my impression is that it doesn't converge to a unique solution, i.e.
different starting values end up with different maximum likellihood
fit results. I haven't tried powerlaw specifically before, but for
pareto there are some problems with mle if loc and scale are also
estimated.
It could be that powerlaw is also a distribution that requires special
fitting methods. I was using gamma as one of the distributions to play
with for fitting.
>
> I did notice that it really drags on for the beta distribution, I
> would need to tune my GA parameters; for example, to give up after
> fewer than the 500 generations I used in the initial test code.
If you mainly look for starting values instead of getting a good
estimate, then it might be possible just to do a rough randomization.
Especially for online/interactive estimation of many distributions,
this looks a bit too slow to me.
But I think using a global optimizer will be quite a bit more robust
for several distributions where the likelihood doesn't have a well
behaved shape.
Josef
>
> James
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>
More information about the SciPy-Dev
mailing list