Fitting Log normal or truncated log normal distibution to three points
Hi, I have three points 10800, 81100, 582000. What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy. I would appreciate your reply for the same. Cheers Sachin ************************************************************************ Sachin Kumar Sharma Senior Geomodeler
On Jun 30, 2011, at 10:03 AM, Sachin Kumar Sharma wrote:
Hi,
I have three points 10800, 81100, 582000.
What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy.
The lognormal and maximum likelihood fit is available in scipy.stats. It's easier to use this than to implement your own fit in numpy, although if you can't use scipy for some reason you can have a look there on how to implement it in numpy. The rest of this reply uses scipy.stats, so if you are only interested in numpy, please stop reading.
data = [10800, 81100, 582000] import scipy.stats print scipy.stats.lognorm.extradoc
Lognormal distribution lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) for x > 0, s > 0. If log x is normally distributed with mean mu and variance sigma**2, then x is log-normally distributed with shape paramter sigma and scale parameter exp(mu).
scipy.stats.lognorm.fit(data) /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in subtract and max(abs(fsim[0]-fsim[1:])) <= ftol): /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in absolute and max(abs(fsim[0]-fsim[1:])) <= ftol): (1.0, 30618.493505379971, 117675.94879488947) scipy.stats.lognorm.fit(data, floc=0, fscale=1) [11.405078125000022, 0, 1]
See http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html for further info on how the location and scale parameter are handled, basically the x in the lognormal.pdf formula above is x = (x_actual - loc) / scale. Now how to fit a truncated lognorm? First note that because log(x) can only be computed for x>0, scipy.stats.lognorm is already truncated to x > 0.
scipy.stats.lognorm.cdf(0, 1) # arguments: (x, s) 0.0 scipy.stats.lognorm.cdf(1, 1) # arguments: (x, s) 0.5 scipy.stats.lognorm.cdf(2, 1) # arguments: (x, s) 0.75589140421441725
As documented here, you can use parameters a and b to set the support of the distribution (i.e. the lower and upper truncation points):
help(scipy.stats.rv_continuous) | a : float, optional | Lower bound of the support of the distribution, default is minus | infinity. | b : float, optional | Upper bound of the support of the distribution, default is plus | infinity.
However when I try to use the a, b parameters to call pdf() (as a simpler method than fit() to check if it works) I run into the following problem:
scipy.stats.lognorm.pdf(1, 1) 0.3989422804014327 scipy.stats.lognorm(a=1).pdf(1, 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: pdf() takes exactly 2 arguments (3 given) scipy.stats.lognorm(a=1).pdf(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 335, in pdf return self.dist.pdf(x, *self.args, **self.kwds) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 1113, in pdf place(output,cond,self._pdf(*goodargs) / scale) TypeError: _pdf() takes exactly 3 arguments (2 given)
For a distribution without parameters besides (loc, scale), setting a works:
scipy.stats.norm(a=-2).pdf(3) 0.0044318484119380075
Is this a bug or am I simply using it wrong? It would be nice if http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html contained an example or two of how to use the a, b, xa, xb, xtol parameters of scipy.stats.rv_continuous . Christoph
On Thu, Jun 30, 2011 at 5:25 AM, Christoph Deil <Deil.Christoph@googlemail.com> wrote:
On Jun 30, 2011, at 10:03 AM, Sachin Kumar Sharma wrote:
Hi,
I have three points 10800, 81100, 582000.
What is the easiest way of fitting a log normal and truncated log normal distribution to these three points using numpy.
The lognormal and maximum likelihood fit is available in scipy.stats. It's easier to use this than to implement your own fit in numpy, although if you can't use scipy for some reason you can have a look there on how to implement it in numpy. The rest of this reply uses scipy.stats, so if you are only interested in numpy, please stop reading.
data = [10800, 81100, 582000] import scipy.stats print scipy.stats.lognorm.extradoc
Lognormal distribution lognorm.pdf(x,s) = 1/(s*x*sqrt(2*pi)) * exp(-1/2*(log(x)/s)**2) for x > 0, s > 0. If log x is normally distributed with mean mu and variance sigma**2, then x is log-normally distributed with shape paramter sigma and scale parameter exp(mu).
scipy.stats.lognorm.fit(data) /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in subtract and max(abs(fsim[0]-fsim[1:])) <= ftol): /opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/optimize/optimize.py:280: RuntimeWarning: invalid value encountered in absolute and max(abs(fsim[0]-fsim[1:])) <= ftol): (1.0, 30618.493505379971, 117675.94879488947) scipy.stats.lognorm.fit(data, floc=0, fscale=1) [11.405078125000022, 0, 1] See http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html for further info on how the location and scale parameter are handled, basically the x in the lognormal.pdf formula above is x = (x_actual - loc) / scale. Now how to fit a truncated lognorm? First note that because log(x) can only be computed for x>0, scipy.stats.lognorm is already truncated to x > 0. scipy.stats.lognorm.cdf(0, 1) # arguments: (x, s) 0.0 scipy.stats.lognorm.cdf(1, 1) # arguments: (x, s) 0.5 scipy.stats.lognorm.cdf(2, 1) # arguments: (x, s) 0.75589140421441725 As documented here, you can use parameters a and b to set the support of the distribution (i.e. the lower and upper truncation points): help(scipy.stats.rv_continuous) | a : float, optional | Lower bound of the support of the distribution, default is minus | infinity. | b : float, optional | Upper bound of the support of the distribution, default is plus | infinity. However when I try to use the a, b parameters to call pdf() (as a simpler method than fit() to check if it works) I run into the following problem: scipy.stats.lognorm.pdf(1, 1) 0.3989422804014327 scipy.stats.lognorm(a=1).pdf(1, 1) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: pdf() takes exactly 2 arguments (3 given) scipy.stats.lognorm(a=1).pdf(1) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 335, in pdf return self.dist.pdf(x, *self.args, **self.kwds) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/stats/distributions.py", line 1113, in pdf place(output,cond,self._pdf(*goodargs) / scale) TypeError: _pdf() takes exactly 3 arguments (2 given) For a distribution without parameters besides (loc, scale), setting a works: scipy.stats.norm(a=-2).pdf(3) 0.0044318484119380075 Is this a bug or am I simply using it wrong? It would be nice if http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html contained an example or two of how to use the a, b, xa, xb, xtol parameters of scipy.stats.rv_continuous .
in your last example a=-2 is ignored
stats.norm.pdf(3) 0.0044318484119380075
a, b, xa, xb are attributes of the distribution that are set when the distribution instance is created. changing a and b doesn't make much sense since you would truncate the support of the distribution without changing the distribution
stats.norm.pdf(-3) 0.0044318484119380075 stats.norm.a = -2 stats.norm.pdf(-3) 0.0
stats.norm and the other distributions is an instances, and if you change a in it it will stay this way, and might mess up other calculations you might want to do. xa, xb are only used internally as limits for the inversion of the cdf (limits to fsolve) and I don't think they have any other effect. what's a truncated lognormal ? left or right side truncated. I think it might need to be created by subclassing. there might be a direct way of estimating lognormal parameters from log(x).mean() and log(x).std() if x is a lognormal sample. (that's a scipy user question) Josef
Christoph _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (3)
-
Christoph Deil
-
josef.pktd@gmail.com
-
Sachin Kumar Sharma