[SciPy-Dev] Subversion scipy.stats irregular problem with source code example

Thu Sep 30 20:20:46 EDT 2010

On Thu, Sep 30, 2010 at 4:27 PM, James Phillips <zunzun at zunzun.com> wrote:
> On Thu, Sep 30, 2010 at 10:05 AM,  <josef.pktd at gmail.com> wrote:
>>
>> Why did you choose to minimize the squared difference of quantiles,
>> instead of the negative log-likelihood in the diffev ?
>
> For my use in estimating initial parameters, the genetic algorithm
> needs a cutoff point at which a sufficient solution has been reached
> and I have useful starting parameters.  While I would not know in
> advance what log likelihood value to use as a stopping parameter, I do
> know in advance that a residual sum-of-squares near zero should have
> good initial parameter estimates.  That means I can pass a small value
> to the genetic algorithm and have it stop if it finds such parameters,
> there would no need to continue past that point.

ppf is (very) expensive to calculate for some distributions. I tried
something similar, but then switched to matching the cdf instead of
the pdf. The differential evolution seems pretty slow in this case.
What I also did in larger samples is to match only a few quantiles
instead of each observation. (I haven't quite figured out yet how to
get standard errors when matching quantiles this way.)

I tried a few different starting values for your powerlaw example and
my impression is that it doesn't converge to a unique solution, i.e.
different starting values end up with different maximum likellihood
fit results. I haven't tried powerlaw specifically before, but for
pareto there are some problems with mle if loc and scale are also
estimated.

It could be that powerlaw is also a distribution that requires special
fitting methods. I was using gamma as one of the distributions to play
with for fitting.

>
> I did notice that it really drags on for the beta distribution, I
> would need to tune my GA parameters; for example, to give up after
> fewer than the 500 generations I used in the initial test code.

If you mainly look for starting values instead of getting a good
estimate, then it might be possible just to do a rough randomization.
Especially for online/interactive estimation of many distributions,
this looks a bit too slow to me.
But I think using a global optimizer will be quite a bit more robust
for several distributions where the likelihood doesn't have a well
behaved shape.

Josef

>
>     James
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>