[Numpy-discussion] numpy.random.pareto, m equal zero

josef.pktd at gmail.com josef.pktd at gmail.com
Fri Aug 7 22:38:04 EDT 2009


Does it make any (statistical) sense to have numpy.random.pareto
produce random numbers that start at zero?
Can we change it to start at 1 which is the usual default?

Notation from http://docs.scipy.org/numpy/docs/numpy.random.mtrand.RandomState.pareto/
        The probability density for the Pareto distribution is
        .. math:: p(x) = \\frac{am^a}{x^{a+1}}
        where :math:`a` is the shape and :math:`m` the location

constraints from Johnson, Kotz, Balakrishnan vol1 page 574
m>0, a>0, x>=m

1) as m goes to zero, the pdf goes to zero for every point, (mean,
variance go to zero, essentially masspoint at zero)

2) quote from http://www.itl.nist.gov/div898/software/dataplot/refman2/auxillar/parpdf.htm
(their `a` is our `m`)

" Note that although the a (=m JP) parameter is typically called a
location parameter (and it is in the sense that it defines the lower
bound), it is not a location parameter in the technical sense that the
following relation does not hold:

      f(x;gamma,a) = f((x-a);gamma,0)

For this reason, Dataplot treats a (=m JP) as a shape parameter. In
Dataplot, the a (=m JP) shape parameter is optional with a default
value of 1. "


my conclusion:
---------------------
What numpy.random.pareto actually produces, are random numbers from a
pareto distribution with lower bound m=1, but location parameter
loc=-1, that shifts the distribution to the left.

To actually get useful  random numbers (that are correct in the usual
usage http://en.wikipedia.org/wiki/Pareto_distribution), we need to
add 1 to them.
stats.distributions doesn't use mtrand.pareto (why?), so I never
needed to check this before.

rvs_pareto = 1 + numpy.random.pareto(a, size)

for correction in some calculation, see the thread on the power distribution.

Do we have to live with loc=-1, or can we change it, or am I
misinterpreting something (which wouldn't be the first time either)?

Josef



More information about the NumPy-Discussion mailing list