[Numpy-discussion] pareto docstring

Mon May 10 16:56:15 EDT 2010

On Mon, May 10, 2010 at 11:14 AM, T J <tjhnson at gmail.com> wrote:

> On Sun, May 9, 2010 at 4:49 AM,  <josef.pktd at gmail.com> wrote:
> >
> > I think this is the same point, I was trying to make last year.
> >
> > Instead of renormalizing, my conclusion was the following,
> > (copied from the mailinglist August last year)
> >
> > """
> > my conclusion:
> > ---------------------
> > What numpy.random.pareto actually produces, are random numbers from a
> > pareto distribution with lower bound m=1, but location parameter
> > loc=-1, that shifts the distribution to the left.
> >
> > To actually get useful  random numbers (that are correct in the usual
> > usage http://en.wikipedia.org/wiki/Pareto_distribution), we need to
> > add 1 to them.
> > stats.distributions doesn't use mtrand.pareto
> >
> > rvs_pareto = 1 + numpy.random.pareto(a, size)
> >
> > """
> >
> > I still have to work though the math of your argument, but maybe we
> > can come to an agreement how the docstrings (or the function) should
> > be changed, and what numpy.random.pareto really means.
> >
> > Josef
> > (grateful, that there are another set of eyes on this)
> >
> >
>
>
> Yes, I think my "renormalizing" statement is incorrect as it is really
> just sampling from a different pdf altogether.  See the following image:
>
> http://www.dumpt.com/img/viewer.php?file=q9tfk7ehxsw865vn067c.png
>
> It plots histograms of the various implementations against the pdfs.
> Summarizing:
>
> The NumPy implementation is based on (Devroye p. 262).  The pdf listed
> there is:
>
>    a / (1+x)^(a+1)
>
> This differs from the "standard" Pareto pdf:
>
>    a / x^(a+1)
>
> It also differs from the pdf of the generalized Pareto distribution,
> with scale=1 and location=0:
>
>    (1 + a x)^(-1/a - 1)
>
> And it also differs from the pdf of the generalized Pareto
> distribution with scale=1 and location=-1  or location=1.
>
> random.paretovariate and scipy.stats.pareto sample from the standard
> Pareto, and this is the desired behavior, IMO.  Its true that "1 +
> np.random.pareto" provides the fix, but I think we're better off
> changing the underlying implementation.  Devroye has a more recent
> paper:
>
>  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.85.8760
>
> which states the Pareto distribution in the standard way.  So I think
> it is safe to make this change.  Backwards compatibility might be the
> only argument for not making this change.  So here is my proposal:
>
>  1) Remove every mention of the generalized Pareto distribution from
> the docstring.  As far as I can see, the generalized Pareto
> distribution does not reduce to the "standard" Pareto at all.  We can
> still mention scipy.stats.distributions.genpareto and
> scipy.stats.distributions.pareto.  The former is something different
> and the latter will (now) be equivalent to the NumPy function.
>
>  2) Modify numpy/random/mtrand/distributions.c in the following way:
>
> double rk_pareto(rk_state *state, double a)
> {
>   //return exp(rk_standard_exponential(state)/a) - 1;
>   return 1.0 / rk_double(state)**(1.0 / a);
> }
>
> Does this sound good?
> _______________________________________________
>

Whatever the community decides, don't forget to please go through the formal
procedure of submitting a "bug" ticket so all of this is recorded in the
"right" way in the "right" place.  Thanks!

DG
-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100510/a0398cd0/attachment.html>