[Numpy-discussion] Pull request review #3770: Trapezoidal distribution
josef.pktd at gmail.com
josef.pktd at gmail.com
Sun Sep 22 07:24:22 EDT 2013
On Sat, Sep 21, 2013 at 1:55 PM, Jeremy Hetzel <jthetzel at gmail.com> wrote:
> I've added a trapezoidal distribution to numpy.random for consideration,
> pull request 3770:
> https://github.com/numpy/numpy/pull/3770
>
> Similar to the triangular distribution, the trapezoidal distribution may be
> used where the underlying distribution is not known, but some knowledge of
> the limits and mode exists. The trapezoidal distribution generalizes the
> triangular distribution by allowing the modal values to be expressed as a
> range instead of a point estimate.
>
> The trapezoidal distribution implemented, known as the "generalized
> trapezoidal distribution," has three additional parameters: growth, decay,
> and boundary ratio. Adjusting these from the default values create
> trapezoidal-like distributions with non-linear behavior. Examples can be
> seen in an R vignette (
> http://cran.r-project.org/web/packages/trapezoid/vignettes/trapezoid.pdf ),
> as well as these papers by J.R. van Dorp and colleagues:
>
> 1) van Dorp, J. R. and Kotz, S. (2003) Generalized trapezoidal
> distributions. Metrika. 58(1):85–97. Preprint available:
> http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/Metrika2003VanDorp.pdf
>
> 2) van Dorp, J. R., Rambaud, S.C., Perez, J. G., and Pleguezuelo, R. H.
> (2007) An elicitation procedure for the generalized trapezoidal distribution
> with a uniform central stage. Decision Analysis Journal. 4:156–166. Preprint
> available:
> http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/DA2007.pdf
>
> The docstring for the proposed numpy.random.trapezoidal() is as follows:
>
> """
> trapezoidal(left, mode1, mode2, right, size=None, m=2, n=2, alpha=1)
>
> Draw samples from the generalized trapezoidal distribution.
>
> The trapezoidal distribution is defined by minimum (``left``), lower
> mode (``mode1``), upper
> mode (``mode1``), and maximum (``right``) parameters. The
> generalized trapezoidal distribution
> adds three more parameters: the growth rate (``m``), decay rate
> (``n``), and boundary
> ratio (``alpha``) parameters. The generalized trapezoidal
> distribution simplifies
> to the trapezoidal distribution when ``m = n = 2`` and ``alpha =
> 1``. It further
> simplifies to a triangular distribution when ``mode1 == mode2``.
>
> Parameters
> ----------
> left : scalar
> Lower limit.
> mode1 : scalar
> The value where the first peak of the distribution occurs.
> The value should fulfill the condition ``left <= mode1 <=
> mode2``.
> mode2 : scalar
> The value where the first peak of the distribution occurs.
> The value should fulfill the condition ``mode1 <= mode2 <=
> right``.
> right : scalar
> Upper limit, should be larger than or equal to `mode2`.
> size : int or tuple of ints, optional
> Output shape. Default is None, in which case a single value is
> returned.
> m : scalar, optional
> Growth parameter.
> n : scalar, optional
> Decay parameter.
> alpha : scalar, optional
> Boundary ratio parameter.
>
> Returns
> -------
> samples : ndarray or scalar
> The returned samples all lie in the interval [left, right].
>
> Notes
> -----
> With ``left``, ``mode1``, ``mode2``, ``right``, ``m``, ``n``, and
> ``alpha`` parametrized as
> :math:`a, b, c, d, m, n, \\text{ and } \\alpha`, respectively,
> the probability density function for the generalized trapezoidal
> distribution is
>
> .. math::
> f{\\scriptscriptstyle X}(x\mid\theta) =
> \\mathcal{C}(\\Theta) \\times
> \\begin{cases}
> \\alpha \\left(\\frac{x - \\alpha}{b - \\alpha}
> \\right)^{m - 1}, & \\text{for } a \\leq x < b \\\\
> (1 - \\alpha) \\left(\frac{x - b}{c - b} \\right)
> + \\alpha, & \\text{for } b \\leq x < c \\\\
> \\left(\\frac{d - x}{d - c} \\right)^{n-1}, &
> \\text{for } c \\leq x \\leq d
> \\end{cases}
>
> with the normalizing constant :math:`\\mathcal{C}(\\Theta)` defined
> as
>
> ..math::
> \\mathcal{C}(\\Theta) =
> \\frac{2mn}
> {2 \\alpha \\left(b - a\\right) n +
> \\left(\\alpha + 1 \\right) \\left(c - b \\right)mn
> +
> 2 \\left(d - c \\right)m}
>
> and where the parameter vector :math:`\\Theta = \\{a, b, c, d, m, n,
> \\alpha \\}, \\text{ } a \\leq b \\leq c \\leq d, \\text{ and } m, n,
> \\alpha >0`.
>
> Similar to the triangular distribution, the trapezoidal distribution
> may be used where the
> underlying distribution is not known, but some knowledge of the
> limits and
> mode exists. The trapezoidal distribution generalizes the triangular
> distribution by allowing
> the modal values to be expressed as a range instead of a point
> estimate. The growth, decay, and
> boundary ratio parameters of the generalized trapezoidal
> distribution further allow for non-linear
> behavior to be specified.
>
> References
> ----------
> .. [1] van Dorp, J. R. and Kotz, S. (2003) Generalized trapezoidal
> distributions.
> Metrika. 58(1):85–97.
> Preprint available:
> http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/Metrika2003VanDorp.pdf
> .. [2] van Dorp, J. R., Rambaud, S.C., Perez, J. G., and
> Pleguezuelo, R. H. (2007)
> An elicitation proce-dure for the generalized trapezoidal
> distribution with a uniform central stage.
> Decision AnalysisJournal. 4:156–166.
> Preprint available:
> http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/DA2007.pdf
>
> Examples
> --------
> Draw values from the distribution and plot the histogram:
>
> >>> import matplotlib.pyplot as plt
> >>> h = plt.hist(np.random.triangular(0, 0.25, 0.75, 1, 100000),
> bins=200,
> ... normed=True)
> >>> plt.show()
>
> """
>
> I am unsure if NumPy encourages incorporation of new distributions into
> numpy.random or instead into separate modules, but found the exercise to be
> helpful regardless.
I don't see a reason that numpy.random shouldn't get new
distributions. It would also be useful to add the corresponding
distribution to scipy.stats.
I'm not familiar with the generalized trapezoidal distribution and
don't know where it's used, neither have I ever used triangular.
naming: n, m would indicate to me that they are integers, but it they
can be floats (>0)
alpha, beta ?
about the parameterization - no problem here
Is there a standard version, e.g. left=0, right=1, mode1=?, ... ?
In scipy.stats.distribution we are required to use a location, scale
parameterization, where loc shifts the distribution and scale
stretches it.
Is there a standard parameterization for that?, for example
left = loc = 0 (default) or left = loc / scale = 0
right = scale = 1 (default)
mode1_relative = mode1 / scale
mode2_relative = mode2 / scale
n, m unchanged no defaults
just checked:
your naming corresponds to triangular, and triang in scipy has the
corresponding loc-scale parameterization.
Josef
>
> Thanks,
> Jeremy
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
More information about the NumPy-Discussion
mailing list