[Numpy-discussion] pareto docstring

Tue May 11 03:48:04 EDT 2010

On Tue, May 11, 2010 at 12:23 AM, T J <tjhnson at gmail.com> wrote:

> On Mon, May 10, 2010 at 8:37 PM,  <josef.pktd at gmail.com> wrote:
> >
> > I went googling and found a new interpretation
> >
> > numpy.random.pareto is actually the Lomax distribution also known as
> Pareto 2,
> > Pareto (II) or Pareto Second Kind distribution
> >
>
> Great!
>
> >
> > So, from this it looks like numpy.random does not have a Pareto
> > distribution, only Lomax, and the confusion came maybe because
> > somewhere in the history the (II) (second kind) got dropped in the
> > explanations.
> >
> > and actually it is in scipy.stats.distributions, but without rvs
> >
> > # LOMAX (Pareto of the second kind.)
> > #  Special case of Pareto of the first kind (location=-1.0)
> >
>
> I understand the point with this last comment, but I think it can be
> confusing in that the Pareto (of the first kind) has no "location"
> parameter and people might think you are referring to the Generalized
> Pareto distribution.  I think its much clearer to say:
>
>  # Special case of the Pareto of the first kind, but shifted to the
> left by 1.    x --> x + 1
>
> >
> >>
> >>  2) Modify numpy/random/mtrand/distributions.c in the following way:
> >>
> >> double rk_pareto(rk_state *state, double a)
> >> {
> >>   //return exp(rk_standard_exponential(state)/a) - 1;
> >>   return 1.0 / rk_double(state)**(1.0 / a);
> >> }
> >
> > I'm not an expert on random number generator, but using the uniform
> distribution
> > as in
> >
> http://en.wikipedia.org/wiki/Pareto_distribution#Generating_a_random_sample_from_Pareto_distribution
> > and your Devroy reference seems better, than based on the relationship to
> > the exponential distribution
> >
> http://en.wikipedia.org/wiki/Pareto_distribution#Relation_to_the_exponential_distribution
> >
> >
>
> Correct.  The exp relationship was for the existing implementation
> (which corresponds to the Lomax).  I commented that line out and just
> used 1/U^(1/a).
>
>
> > I think without changing the source we can rewrite the docstring that
> > this is Lomax (or
> > Pareto of the Second Kind), so that at least the documentation is less
> > misleading.
> >
> > But I find calling it Pareto very confusing, and I'm not the only one
> anymore,
> > (and I don't know if anyone has used it assuming it is classical Pareto),
> > so my preferred solution would be
> >
> > * rename numpy.random.pareto to numpy.random.lomax
> > * and create a real (classical, first kind) pareto distribution (even
> > though it's just
> >  adding or subtracting 1, ones we know it)
> >
>
> I personally have used numpy.random.pareto thinking it was the Pareto
> distribution of the first kind---which led to this post in the first
> place.  So, I'm in strong agreement.  While doing this, perhaps we
> should increase functionality and allow users the ability to specify
> the scale of the distribution (instead of just the shape)?
>
> I can make a ticket for this and give a stab at creating the necessary
> patch.
>
>
> >
> > What's the backwards compatibility policy with very confusing names in
> numpy?
> >
>
> It seems reasonable that we might have to follow the deprecation
> route, but I'd be happier with a "faster" fix.
>
> 1.5
>  - Provide numpy.random.lomax.  Make numpy.random.pareto raise a
> DeprecationWarning and then call lomax.
> 2.0 (if there is no 1.6)
>  - Make numpy.random.pareto behave as Pareto distribution of 1st kind.
>
> Immediately though, we can modify the docstring that is currently in
> there to make the situation clear, instructing users how they can
> generate samples from the "standard" Pareto distribution.  This is the
> first patch I'll submit.  Perhaps it is better to only change the
> docstring and then save all changes in functionality for 2.0.
> Deferring to others on this one...
>

Elsewhere in the mailing list, it has been stated that our "policy" is to
document desired/intended behavior, when such differs from actual (current)
behavior.  This can be done in advance of a code fix to implement the
desired behavior, but we have discouraged (to the point of saying "don't do
it") documenting current behavior when it is known that this should (and
presumably will) be changed.

DG

> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

-- 
Mathematician: noun, someone who disavows certainty when their uncertainty
set is non-empty, even if that set has measure zero.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20100511/6b8d90f8/attachment.html>