Mailman 3 Fitting Log normal or truncated log normal distibution to three points - NumPy-Discussion

newer
feedback request: proposal to add...

Fitting Log normal or truncated log normal distibution to three points

older
Two bugs in recfunctions.join_by...

Sachin Kumar Sharma

June 30, 2011

8:03 a.m.

Hi, I have three points 10800, 81100, 582000. What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy. I would appreciate your reply for the same. Cheers Sachin ************************************************************************ Sachin Kumar Sharma Senior Geomodeler

Attachments:

attachment.htm (text/html — 2.9 KB)

Show replies by date

Christoph Deil

June 2011

9:25 a.m.

New subject: Fitting Log normal or truncated log normal distibution to three points

On Jun 30, 2011, at 10:03 AM, Sachin Kumar Sharma wrote:

...

Hi,

I have three points 10800, 81100, 582000.

What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy.

The lognormal and maximum likelihood fit is available in scipy.stats. It's easier to use this than to implement your own fit in numpy, although if you can't use scipy for some reason you can have a look there on how to implement it in numpy. The rest of this reply uses scipy.stats, so if you are only interested in numpy, please stop reading.

...

...
...
data = [10800, 81100, 582000] import scipy.stats print scipy.stats.lognorm.extradoc

Lognormal distribution lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) for x > 0, s > 0. If log x is normally distributed with mean mu and variance sigma**2, then x is log-normally distributed with shape paramter sigma and scale parameter exp(mu).

...

...
...
scipy.stats.lognorm.fit(data) /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in subtract and max(abs(fsim[0]-fsim[1:])) <= ftol): /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in absolute and max(abs(fsim[0]-fsim[1:])) <= ftol): (1.0, 30618.493505379971, 117675.94879488947) scipy.stats.lognorm.fit(data, floc=0, fscale=1) [11.405078125000022, 0, 1]

See http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html for further info on how the location and scale parameter are handled, basically the x in the lognormal.pdf formula above is x = (x_actual - loc) / scale. Now how to fit a truncated lognorm? First note that because log(x) can only be computed for x>0, scipy.stats.lognorm is already truncated to x > 0.

...

...
...
scipy.stats.lognorm.cdf(0, 1) # arguments: (x, s) 0.0 scipy.stats.lognorm.cdf(1, 1) # arguments: (x, s) 0.5 scipy.stats.lognorm.cdf(2, 1) # arguments: (x, s) 0.75589140421441725

As documented here, you can use parameters a and b to set the support of the distribution (i.e. the lower and upper truncation points):

...

...
...
help(scipy.stats.rv_continuous) | a : float, optional | Lower bound of the support of the distribution, default is minus | infinity. | b : float, optional | Upper bound of the support of the distribution, default is plus | infinity.

However when I try to use the a, b parameters to call pdf() (as a simpler method than fit() to check if it works) I run into the following problem:

...

...
...
scipy.stats.lognorm.pdf(1, 1) 0.3989422804014327 scipy.stats.lognorm(a=1).pdf(1, 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: pdf() takes exactly 2 arguments (3 given) scipy.stats.lognorm(a=1).pdf(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 335, in pdf return self.dist.pdf(x, *self.args, **self.kwds) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 1113, in pdf place(output,cond,self._pdf(*goodargs) / scale) TypeError: _pdf() takes exactly 3 arguments (2 given)

For a distribution without parameters besides (loc, scale), setting a works:

...

...
...
scipy.stats.norm(a=-2).pdf(3) 0.0044318484119380075

Is this a bug or am I simply using it wrong? It would be nice if http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html contained an example or two of how to use the a, b, xa, xb, xtol parameters of scipy.stats.rv_continuous . Christoph

josef.pktd＠gmail.com

12:49 p.m.

On Thu, Jun 30, 2011 at 5:25 AM, Christoph Deil <Deil.Christoph@googlemail.com> wrote:

...

On Jun 30, 2011, at 10:03 AM, Sachin Kumar Sharma wrote:

Hi,

I have three points 10800, 81100, 582000.

What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy.

The lognormal and maximum likelihood fit is available in scipy.stats. It's easier to use this than to implement your own fit in numpy, although if you can't use scipy for some reason you can have a look there on how to implement it in numpy. The rest of this reply uses scipy.stats, so if you are only interested in numpy, please stop reading.

...
...
...
data = [10800, 81100, 582000] import scipy.stats print scipy.stats.lognorm.extradoc

Lognormal distribution lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) for x > 0, s > 0. If log x is normally distributed with mean mu and variance sigma**2, then x is log-normally distributed with shape paramter sigma and scale parameter exp(mu).

...
...
...
scipy.stats.lognorm.fit(data) /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in subtract and max(abs(fsim[0]-fsim[1:])) <= ftol): /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in absolute and max(abs(fsim[0]-fsim[1:])) <= ftol): (1.0, 30618.493505379971, 117675.94879488947) scipy.stats.lognorm.fit(data, floc=0, fscale=1) [11.405078125000022, 0, 1] See http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html for further info on how the location and scale parameter are handled, basically the x in the lognormal.pdf formula above is x = (x_actual - loc) / scale. Now how to fit a truncated lognorm? First note that because log(x) can only be computed for x>0, scipy.stats.lognorm is already truncated to x > 0. scipy.stats.lognorm.cdf(0, 1) # arguments: (x, s) 0.0 scipy.stats.lognorm.cdf(1, 1) # arguments: (x, s) 0.5 scipy.stats.lognorm.cdf(2, 1) # arguments: (x, s) 0.75589140421441725 As documented here, you can use parameters a and b to set the support of the distribution (i.e. the lower and upper truncation points): help(scipy.stats.rv_continuous) | a : float, optional | Lower bound of the support of the distribution, default is minus | infinity. | b : float, optional | Upper bound of the support of the distribution, default is plus | infinity. However when I try to use the a, b parameters to call pdf() (as a simpler method than fit() to check if it works) I run into the following problem: scipy.stats.lognorm.pdf(1, 1) 0.3989422804014327 scipy.stats.lognorm(a=1).pdf(1, 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: pdf() takes exactly 2 arguments (3 given) scipy.stats.lognorm(a=1).pdf(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 335, in pdf return self.dist.pdf(x, *self.args, **self.kwds) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 1113, in pdf place(output,cond,self._pdf(*goodargs) / scale) TypeError: _pdf() takes exactly 3 arguments (2 given) For a distribution without parameters besides (loc, scale), setting a works: scipy.stats.norm(a=-2).pdf(3) 0.0044318484119380075 Is this a bug or am I simply using it wrong? It would be nice if http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html contained an example or two of how to use the a, b, xa, xb, xtol parameters of scipy.stats.rv_continuous .

in your last example a=-2 is ignored

...

...
...
stats.norm.pdf(3) 0.0044318484119380075

a, b, xa, xb are attributes of the distribution that are set when the distribution instance is created. changing a and b doesn't make much sense since you would truncate the support of the distribution without changing the distribution

...

...
...
stats.norm.pdf(-3) 0.0044318484119380075 stats.norm.a = -2 stats.norm.pdf(-3) 0.0

stats.norm and the other distributions is an instances, and if you change a in it it will stay this way, and might mess up other calculations you might want to do. xa, xb are only used internally as limits for the inversion of the cdf (limits to fsolve) and I don't think they have any other effect. what's a truncated lognormal ? left or right side truncated. I think it might need to be created by subclassing. there might be a direct way of estimating lognormal parameters from log(x).mean() and log(x).std() if x is a lognormal sample. (that's a scipy user question) Josef

...

Christoph _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

4969

Age (days ago)

4969

Last active (days ago)

List overview

Download

2 comments

3 participants

participants (3)

Christoph Deil
josef.pktd＠gmail.com
Sachin Kumar Sharma

Fitting Log normal or truncated log normal distibution to three points

Sachin Kumar Sharma

Christoph Deil

josef.pktd＠gmail.com

tags

participants (3)