[Numpy-discussion] Power distribution

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Aug 7 23:55:45 EDT 2009


On Fri, Aug 7, 2009 at 10:17 PM, <alan at ajackson.org> wrote:
> Thanks! That helps a lot.

Thanks for improving the docs.

>
>>On Fri, Aug 7, 2009 at 8:54 PM, <josef.pktd at gmail.com> wrote:
>>> On Fri, Aug 7, 2009 at 6:57 PM, <josef.pktd at gmail.com> wrote:
>>>> On Fri, Aug 7, 2009 at 6:13 PM, <josef.pktd at gmail.com> wrote:
>>>>> On Fri, Aug 7, 2009 at 5:42 PM, <josef.pktd at gmail.com> wrote:
>>>>>> On Fri, Aug 7, 2009 at 5:25 PM, Andrew Hawryluk<HAWRYLA at novachem.com> wrote:
>>>>>>> Hmm ... good point.
>>>>>>> It appears to give a probability distribution proportional to x**(a-1),
>>>>>>> but I see no good reason why the domain should be limited to [0,1].
>>>>>>>
>>>>>>> def test(a):
>>>>>>>    nums =
>>>>>>> plt.hist(np.random.power(a,100000),bins=100,ec='none',fc='#dddddd')
>>>>>>>    x = np.linspace(0,1,200)
>>>>>>>    plt.plot(x,nums[0][-1]*x**(a-1))
>>>>>>>
>>>>>>> Andrew
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-
>>>>>>>> bounces at scipy.org] On Behalf Of alan at ajackson.org
>>>>>>>> Sent: 7 Aug 2009 2:49 PM
>>>>>>>> To: Discussion of Numerical Python
>>>>>>>> Subject: Re: [Numpy-discussion] Power distribution
>>>>>>>>
>>>>>>>> I don't think that is it, since the one in numpy has a range
>>>>>>> restricted
>>>>>>>> to the interval 0-1.
>>>>>>>>
>>>>>>>> Try out hist(np.random.power(5, 1000000), bins=100)
>>>>>>>>
>>>>>>>> >You might get better results for 'power-law distribution'
>>>>>>>> >http://en.wikipedia.org/wiki/Power_law
>>>>>>>> >
>>>>>>>> >Andrew
>>>>>>>> >
>>>>>>>> >> -----Original Message-----
>>>>>>>> >> From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-
>>>>>>>> >> bounces at scipy.org] On Behalf Of alan at ajackson.org
>>>>>>>> >> Sent: 7 Aug 2009 11:45 AM
>>>>>>>> >> To: Discussion of Numerical Python
>>>>>>>> >> Subject: [Numpy-discussion] Power distribution
>>>>>>>> >>
>>>>>>>> >> Documenting my way through the statistics modules in numpy, I ran
>>>>>>>> >> into the Power Distribution.
>>>>>>>> >>
>>>>>>>> >> Anyone know what that is? I Googled for it, and found a lot of
>>>>>>> stuff
>>>>>>>> >on
>>>>>>>> >> electricity, but no reference for a statistical distribution of
>>>>>>> that
>>>>>>>> >> name. Does it have a common alias?
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>
>>>>>>
>>>>>> same is in Travis' notes on the distribution and scipy.stats.distributions
>>>>>> domain in [0,1], but I don't know anything about it either
>>>>>>
>>>>>> ## Power-function distribution
>>>>>> ##   Special case of beta dist. with d =1.0
>>>>>>
>>>>>> class powerlaw_gen(rv_continuous):
>>>>>>    def _pdf(self, x, a):
>>>>>>        return a*x**(a-1.0)
>>>>>>    def _cdf(self, x, a):
>>>>>>        return x**(a*1.0)
>>>>>>    def _ppf(self, q, a):
>>>>>>        return pow(q, 1.0/a)
>>>>>>    def _stats(self, a):
>>>>>>        return a/(a+1.0), a*(a+2.0)/(a+1.0)**2, \
>>>>>>               2*(1.0-a)*sqrt((a+2.0)/(a*(a+3.0))), \
>>>>>>               6*polyval([1,-1,-6,2],a)/(a*(a+3.0)*(a+4))
>>>>>>    def _entropy(self, a):
>>>>>>        return 1 - 1.0/a - log(a)
>>>>>> powerlaw = powerlaw_gen(a=0.0, b=1.0, name="powerlaw",
>>>>>>                        longname="A power-function",
>>>>>>                        shapes="a", extradoc="""
>>>>>>
>>>>>> Power-function distribution
>>>>>>
>>>>>> powerlaw.pdf(x,a) = a*x**(a-1)
>>>>>> for 0 <= x <= 1, a > 0.
>>>>>> """
>>>>>>                        )
>>>>>>
>>>>>
>>>>>
>>>>> it looks like it's the same distribution, even though it doesn't use
>>>>> the random numbers from the numpy function
>>>>>
>>>>> high p-values with Kolmogorov-Smirnov, see below
>>>>>
>>>>> I assume it is a truncated version of *a* powerlaw distribution, so
>>>>> that a can be large, which would be impossible in the open domain
>>>>> case. But a quick search, I only found powerlaw applications that
>>>>> refer to the tail behavior.
>>>>>
>>>>> Josef
>>>>>
>>>>>>>> rvs = np.random.power(5, 100000)
>>>>>>>> stats.kstest(rvs,'powerlaw',(5,))
>>>>> (0.0021079715221341555, 0.76587118275752697)
>>>>>>>> rvs = np.random.power(5, 1000000)
>>>>>>>> stats.kstest(rvs,'powerlaw',(5,))
>>>>> (0.00063983013407076239, 0.80757958281509501)
>>>>>>>> rvs = np.random.power(0.5, 1000000)
>>>>>>>> stats.kstest(rvs,'powerlaw',(0.5,))
>>>>> (0.00081823148457027539, 0.51478478398950211)
>>>>>
>>>>
>>>> I found a short reference in Johnson, Kotz, Balakrishnan vol. 1 where
>>>> it is refered to as the "power-function" distribution.
>>>> roughly: if X is pareto (which kind) distributed, then Y=X**(-1) is
>>>> distributed according to the power-function distribution. JKB have an
>>>> extra parameter in there and is a bit more general then the scipy
>>>> version, or maybe it is just the scale parameter included in the
>>>> density function.
>>>>
>>>> It is also in NIST data plot, but I didn't find the html reference
>>>> page, but only the pdf
>>>>
>>>> http://docs.google.com/gview?a=v&q=cache%3AEgQ6bRkeJl8J%3Awww.itl.nist.gov%2Fdiv898%2Fsoftware%2Fdataplot%2Frefman2%2Fauxillar%2Fpowpdf.pdf+power-function+distribution&hl=en&gl=ca&pli=1
>>>>
>>>> the pdf-files for powpdf and powcdf  are here
>>>> http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/homepage.htm
>>>>
>>>>
>>>> I can look some more a bit later tonight.
>>>>
>>>> Josef
>>>>
>>>
>>>
>>> for the relationship to pareto, below are some kstests and graphs
>>> a reminder that numpy.random.pareto uses a non-standard 0 bound, instead of 1
>>> ks tests don't show good numbers every once in a while, since they are random
>>>
>>> I checked the definitions in JKB (page 607) and my previous
>>> interpretation was correct.
>>> if X has a pareto distribution with lower bound at 1 and shape
>>> parameter a>0, then 1/X has a density function
>>> p(y) = a*y**(a-1),  (0<y<1)
>>> weak inequality in JKB instead of strict as in scipy.stats.powerlaw docstring
>>> (the actual scipy.stats.powerlaw docstring  has a typo, a**x**(a-1),
>>> which I will correct)
>>>
>>> Josef
>>>
>>>
>>> import numpy as np
>>> from scipy import stats
>>> import matplotlib.pyplot as plt
>>>
>>>
>>> rvs = np.random.power(5, 1000000)
>>> rvsp = np.random.pareto(5, 1000000)
>>> rvsps = stats.pareto.rvs(5, size=100)
>>>
>>> print "stats.kstest(1./rvsps,'powerlaw',(5,))"
>>> print stats.kstest(1./rvsps,'powerlaw',(5,))
>>>
>>> print "stats.kstest(1./(1+rvsp),'powerlaw',(5,))"
>>> print stats.kstest(1./(1+rvsp),'powerlaw',(5,))
>>>
>>> print "stats.kstest(rvs,'powerlaw',(5,))"
>>> print stats.kstest(rvs,'powerlaw',(5,))
>>>
>>> print "stats.ks_2samp(rvs,1./(rvsp+1))"
>>> print stats.ks_2samp(rvs,1./(rvsp+1))
>>> print "stats.ks_2samp(rvs,1./rvsps)"
>>> print stats.ks_2samp(rvs,1./rvsps)
>>> print "stats.ks_2samp(1+rvsp, rvsps)"
>>> print stats.ks_2samp(1+rvsp, rvsps)
>>
>>
>>Improvements to graphs, compare with theoretical pdf
>>
>>Josef
>>
>>xx = np.linspace(0,1,100)
>>powpdf = stats.powerlaw.pdf(xx,5)
>>
>>plt.figure()
>>plt.hist(rvs, bins=50, normed=True)
>>plt.plot(xx,powpdf,'r-')
>>plt.title('np.random.power(5)')
>>plt.figure()
>>plt.hist(1./(1.+rvsp), bins=50, normed=True)
>>plt.plot(xx,powpdf,'r-')
>>plt.title('inverse of 1 + np.random.pareto(5)')
>>plt.figure()
>>plt.hist(1./(1.+rvsp), bins=50, normed=True)
>>plt.plot(xx,powpdf,'r-')
>>plt.title('inverse of stats.pareto(5)')


Just a small correction of a copy and paste error, to have the correct
example in the thread:
The last graph is a duplicate and should instead use the scipy.stats
random numbers, rvsps, without the +1 correction, i.e.

plt.figure()
plt.hist(1./rvsps, bins=50, normed=True)
plt.plot(xx,powpdf,'r-')
plt.title('inverse of stats.pareto(5)')

Josef

>
> --
> -----------------------------------------------------------------------
> | Alan K. Jackson            | To see a World in a Grain of Sand      |
> | alan at ajackson.org          | And a Heaven in a Wild Flower,         |
> | www.ajackson.org           | Hold Infinity in the palm of your hand |
> | Houston, Texas             | And Eternity in an hour. - Blake       |
> -----------------------------------------------------------------------
>



More information about the NumPy-Discussion mailing list