[Numpy-discussion] Fitting Log normal or truncated log normal distibution to three points

Thu Jun 30 08:49:36 EDT 2011

On Thu, Jun 30, 2011 at 5:25 AM, Christoph Deil
<Deil.Christoph at googlemail.com> wrote:
>
> On Jun 30, 2011, at 10:03 AM, Sachin Kumar Sharma wrote:
>
> Hi,
>
> I have three points 10800, 81100, 582000.
>
> What is the easiest way of fitting a log normal and truncated log normal
> distribution to these three points using numpy.
>
> The lognormal and maximum likelihood fit is available in scipy.stats. It's
> easier to use this than to implement your own fit in numpy, although if you
> can't use scipy for some reason you can have a look there on how to
> implement it in numpy.
> The rest of this reply uses scipy.stats, so if you are only interested in
> numpy, please stop reading.
>>>> data = [10800, 81100, 582000]
>>>> import scipy.stats
>>>> print scipy.stats.lognorm.extradoc
>
> Lognormal distribution
> lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2)
> for x > 0, s > 0.
> If log x is normally distributed with mean mu and variance sigma**2,
> then x is log-normally distributed with shape paramter sigma and scale
> parameter exp(mu).
>
>>>> scipy.stats.lognorm.fit(data)
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280:
> RuntimeWarning: invalid value encountered in subtract
>   and max(abs(fsim[0]-fsim[1:])) <= ftol):
> /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280:
> RuntimeWarning: invalid value encountered in absolute
>   and max(abs(fsim[0]-fsim[1:])) <= ftol):
> (1.0, 30618.493505379971, 117675.94879488947)
>>>> scipy.stats.lognorm.fit(data, floc=0, fscale=1)
> [11.405078125000022, 0, 1]
> See http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html for
> further info on how the location and scale parameter are handled, basically
> the x in the lognormal.pdf formula above is x = (x_actual - loc) / scale.
> Now how to fit a truncated lognorm?
> First note that because log(x) can only be computed for x>0,
> scipy.stats.lognorm is already truncated to x > 0.
>>>> scipy.stats.lognorm.cdf(0, 1) # arguments: (x, s)
> 0.0
>>>> scipy.stats.lognorm.cdf(1, 1) # arguments: (x, s)
> 0.5
>>>> scipy.stats.lognorm.cdf(2, 1) # arguments: (x, s)
> 0.75589140421441725
> As documented here, you can use parameters a and b to set the support of the
> distribution (i.e. the lower and upper truncation points):
>>>> help(scipy.stats.rv_continuous)
>  |  a : float, optional
>  |      Lower bound of the support of the distribution, default is minus
>  |      infinity.
>  |  b : float, optional
>  |      Upper bound of the support of the distribution, default is plus
>  |      infinity.
> However when I try to use the a, b parameters to call pdf() (as a simpler
> method than fit() to check if it works)  I run into the following problem:
>>>> scipy.stats.lognorm.pdf(1, 1)
> 0.3989422804014327
>>>> scipy.stats.lognorm(a=1).pdf(1, 1)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: pdf() takes exactly 2 arguments (3 given)
>>>> scipy.stats.lognorm(a=1).pdf(1)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py",
> line 335, in pdf
>     return self.dist.pdf(x, *self.args, **self.kwds)
>   File
> "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py",
> line 1113, in pdf
>     place(output,cond,self._pdf(*goodargs) / scale)
> TypeError: _pdf() takes exactly 3 arguments (2 given)
> For a distribution without parameters besides (loc, scale), setting a works:
>>>> scipy.stats.norm(a=-2).pdf(3)
> 0.0044318484119380075
> Is this a bug or am I simply using it wrong?
> It would be nice
> if http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html contained
> an example or two of how to use the a, b, xa, xb, xtol parameters of
> scipy.stats.rv_continuous .

in your last example a=-2 is ignored

>>> stats.norm.pdf(3)
0.0044318484119380075

a, b, xa, xb are attributes of the distribution that are set when the
distribution instance is created.

changing a and b doesn't make much sense since you would truncate the
support of the distribution without changing the distribution

>>> stats.norm.pdf(-3)
0.0044318484119380075
>>> stats.norm.a = -2
>>> stats.norm.pdf(-3)
0.0

stats.norm and the other distributions is an instances, and if you
change a in it it will stay this way, and might mess up other
calculations you might want to do.

xa, xb are only used internally as limits for the inversion of the cdf
(limits to fsolve) and I don't think they have any other effect.

what's a truncated lognormal ? left or right side truncated.
I think it might need to be created by subclassing.

there might be a direct way of estimating lognormal parameters from
log(x).mean() and log(x).std() if x is a lognormal sample.

(that's a scipy user question)

Josef

> Christoph
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>