[SciPy-User] Can I "fix" scale=1 when fitting distributions?

Wed Apr 14 00:56:26 EDT 2010

On Wed, Apr 14, 2010 at 12:52 AM,  <josef.pktd at gmail.com> wrote:
> On Wed, Apr 14, 2010 at 12:09 AM, David Ho <itsdho at ucla.edu> wrote:
>> Actually, being able to "fix" scale=1 would be very useful for me.
>>
>> The reason I'm trying to fit a von mises distribution to my data is to find
>> "kappa", a measure of the concentration of the data.
>>
>> On wikipedia, I see that the von mises distribution only really has 2
>> parameters: mu (the mean), and kappa (1/kappa being analogous to the
>> variance).
>> When I use vonmises.fit(), though, I get 3 parameters: kappa, mu, and a
>> "scale parameter", respectively.
>> However, I don't think the scale parameter for the von mises distribution is
>> really independent of kappa, is it? (Am I understanding this correctly?)
>> (Using the normal distribution as a comparison, I think the "scale
>> parameter" for a normal distribution certainly isn't independent of sigma,
>> and norm.fit() only returns mu and sigma. This makes a lot more sense to
>> me.)
>
> for normal  distribution loc=mean and scale  = sqrt(variance)  and has
> no shape parameter.
>
> essentially all distributions are transformed  y = (x-mean)/scale
> where x is the original data, and y is the standardized data, the
> actual _pdf, _cdf, ... are for the standard version of the
> distribution,
> it's a generic framework, but doesn't make sense in all cases or
> applications mainly when we want to fix the support of the
> distribution
>
>>
>> I'm basically trying to use kappa as a measure of the "width" of the
>> distribution, so the extra degree of freedom introduced by the scale
>> parameter is problematic for me. For two distributions that are
>> superficially very similar in "width", I might get wildly varying values of
>> kappa.
>>
>> For example, I once got fitted values of kappa=1, and kappa=400 for two
>> distributions that looked very similar in width.
>> I thought I must have been doing something wrong... until I saw the scale
>> parameters were around 1 and 19, respectively.
>> Plotting the fitted distributions:
>>>>> xs = numpy.linspace(-numpy.pi, numpy.pi, 361)
>>>>> plot(xs, scipy.stats.distributions.vonmises.pdf(xs, 1, loc=0, scale=1))
>>>>> plot(xs, scipy.stats.distributions.vonmises.pdf(xs, 400, loc=0,
>>>>> scale=19))
>
> one idea is to check the implied moments, eg. vonmises.stats(1, loc=0,
> scale=1, moment='mvsk'))
> to see whether the implied mean, variance, skew, kurtosis are similar
> in two different parameterizations.
> Warning skew, kurtosis is incorrect for a few distributions, I don't
> know about vonmises.
>
>>
>> What I'd really like to do is "force" scale=1 when I perform the fitting.
>> (In fact, it would be nice to even force the mean of the fitted distribution
>> to a precalculated value, as well. I really only want one degree of freedom
>> for the fitting algorithm -- I just want it to explore different values of
>> kappa.)
>>
>> Is there any way to do this?
>
> It's not yet possible in scipy, but easy to write, essentially copy
> the existing fit and nnlf methods and fix loc and scale and call the
> optimization with only one argument.
>
> I have a more than one year old ticket:
> http://projects.scipy.org/scipy/ticket/832
>
> It might be possible that it works immediately if you monkey patch
> rv_continuous  by adding
> nnlf_fr and fit_fr
> in
> http://mail.scipy.org/pipermail/scipy-user/2009-February/019968.html
>
> scipy.stats.rv_continuous.nnlf_fr = nnlfr
> scipy.stats.rv_continuous.fit_fr = fit_fr

correction
this should instead refer to
scipy.stats.distributions.rv_continuous

Josef
>
> and call
> vonmises.fit_fr(data, frozen=[np.nan, 0.0, 1.0])
>
> It's worth a try, but if you only need vonmises it might also be easy
> to use scipy.optimize.fmin on a function that wraps vonmises._nnlf and
> fixes loc, scale.
> To give you an idea, I didn't check what the correct function arguments are ::
>
> def mynnlf(kappa):
>    loc = 0
>    scale = 1
>    return vonmises.nnlf(kappa, loc, scale)
>
> scipy.optimize.fmin(mynnlf, startingvalueforkappa)
>
> I could check this for your case at the end of the week.
>
> If you have an opinion for a good generic API for fit_semifrozen, let me know.
> I hope this helps, and please keep me informed. I would like to make
> at least a cookbook recipe out of it.
>
> Josef
>
>
>> Thanks for your help,
>>
>> --David Ho
>>
>>
>>> On Tue, Apr 13, 2010 at 3:13 PM,  <josef.pktd at gmail.com> wrote:
>>> > In the current fit version, loc (and with it the support of the
>>> > distribution) and scale are always estimated. In some cases this is
>>> > not desired.
>>> > You are transforming the original data to fit into the standard
>>> > distribution with loc=0, scale=1 . Do you get reasonable estimates for
>>> > loc and scale in this case?
>>> > If not, then there is another patch or enhanced fit function that
>>> > could take loc and scale as fixed.
>>> >
>>> > I will look at some details in your function later, especially I'm
>>> > curious how the circular statistics works.
>>> >
>>> > Thanks for the example, maybe I can stop ignoring vonmises.
>>> >
>>> > Josef
>>
>> _______________________________________________
>> SciPy-User mailing list
>> SciPy-User at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-user
>>
>>
>