[Numpy-discussion] Pull request review #3770: Trapezoidal distribution
Jeremy Hetzel
jthetzel at gmail.com
Sat Sep 21 13:55:36 EDT 2013
I've added a trapezoidal distribution to numpy.random for consideration,
pull request 3770:
https://github.com/numpy/numpy/pull/3770
Similar to the triangular distribution, the trapezoidal distribution may be
used where the underlying distribution is not known, but some knowledge of
the limits and mode exists. The trapezoidal distribution generalizes the
triangular distribution by allowing the modal values to be expressed as a
range instead of a point estimate.
The trapezoidal distribution implemented, known as the "generalized
trapezoidal distribution," has three additional parameters: growth, decay,
and boundary ratio. Adjusting these from the default values create
trapezoidal-like distributions with non-linear behavior. Examples can be
seen in an R vignette (
http://cran.r-project.org/web/packages/trapezoid/vignettes/trapezoid.pdf ),
as well as these papers by J.R. van Dorp and colleagues:
1) van Dorp, J. R. and Kotz, S. (2003) Generalized trapezoidal
distributions. Metrika. 58(1):85–97. Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/Metrika2003VanDorp.pdf
2) van Dorp, J. R., Rambaud, S.C., Perez, J. G., and Pleguezuelo, R. H.
(2007) An elicitation procedure for the generalized trapezoidal
distribution with a uniform central stage. Decision Analysis Journal.
4:156–166. Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/DA2007.pdf
The docstring for the proposed numpy.random.trapezoidal() is as follows:
"""
trapezoidal(left, mode1, mode2, right, size=None, m=2, n=2, alpha=1)
Draw samples from the generalized trapezoidal distribution.
The trapezoidal distribution is defined by minimum (``left``),
lower mode (``mode1``), upper
mode (``mode1``), and maximum (``right``) parameters. The
generalized trapezoidal distribution
adds three more parameters: the growth rate (``m``), decay rate
(``n``), and boundary
ratio (``alpha``) parameters. The generalized trapezoidal
distribution simplifies
to the trapezoidal distribution when ``m = n = 2`` and ``alpha =
1``. It further
simplifies to a triangular distribution when ``mode1 == mode2``.
Parameters
----------
left : scalar
Lower limit.
mode1 : scalar
The value where the first peak of the distribution occurs.
The value should fulfill the condition ``left <= mode1 <=
mode2``.
mode2 : scalar
The value where the first peak of the distribution occurs.
The value should fulfill the condition ``mode1 <= mode2 <=
right``.
right : scalar
Upper limit, should be larger than or equal to `mode2`.
size : int or tuple of ints, optional
Output shape. Default is None, in which case a single value is
returned.
m : scalar, optional
Growth parameter.
n : scalar, optional
Decay parameter.
alpha : scalar, optional
Boundary ratio parameter.
Returns
-------
samples : ndarray or scalar
The returned samples all lie in the interval [left, right].
Notes
-----
With ``left``, ``mode1``, ``mode2``, ``right``, ``m``, ``n``, and
``alpha`` parametrized as
:math:`a, b, c, d, m, n, \\text{ and } \\alpha`, respectively,
the probability density function for the generalized trapezoidal
distribution is
.. math::
f{\\scriptscriptstyle X}(x\mid\theta) =
\\mathcal{C}(\\Theta) \\times
\\begin{cases}
\\alpha \\left(\\frac{x - \\alpha}{b - \\alpha}
\\right)^{m - 1}, & \\text{for } a \\leq x < b \\\\
(1 - \\alpha) \\left(\frac{x - b}{c - b} \\right)
+ \\alpha, & \\text{for } b \\leq x < c \\\\
\\left(\\frac{d - x}{d - c} \\right)^{n-1}, &
\\text{for } c \\leq x \\leq d
\\end{cases}
with the normalizing constant :math:`\\mathcal{C}(\\Theta)` defined
as
..math::
\\mathcal{C}(\\Theta) =
\\frac{2mn}
{2 \\alpha \\left(b - a\\right) n +
\\left(\\alpha + 1 \\right) \\left(c - b \\right)mn
+
2 \\left(d - c \\right)m}
and where the parameter vector :math:`\\Theta = \\{a, b, c, d, m,
n, \\alpha \\}, \\text{ } a \\leq b \\leq c \\leq d, \\text{ and } m, n,
\\alpha >0`.
Similar to the triangular distribution, the trapezoidal
distribution may be used where the
underlying distribution is not known, but some knowledge of the
limits and
mode exists. The trapezoidal distribution generalizes the
triangular distribution by allowing
the modal values to be expressed as a range instead of a point
estimate. The growth, decay, and
boundary ratio parameters of the generalized trapezoidal
distribution further allow for non-linear
behavior to be specified.
References
----------
.. [1] van Dorp, J. R. and Kotz, S. (2003) Generalized trapezoidal
distributions.
Metrika. 58(1):85–97.
Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/Metrika2003VanDorp.pdf
.. [2] van Dorp, J. R., Rambaud, S.C., Perez, J. G., and
Pleguezuelo, R. H. (2007)
An elicitation proce-dure for the generalized trapezoidal
distribution with a uniform central stage.
Decision AnalysisJournal. 4:156–166.
Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/DA2007.pdf
Examples
--------
Draw values from the distribution and plot the histogram:
>>> import matplotlib.pyplot as plt
>>> h = plt.hist(np.random.triangular(0, 0.25, 0.75, 1, 100000),
bins=200,
... normed=True)
>>> plt.show()
"""
I am unsure if NumPy encourages incorporation of new distributions into
numpy.random or instead into separate modules, but found the exercise to be
helpful regardless.
Thanks,
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130921/f69cee20/attachment.html>
More information about the NumPy-Discussion
mailing list