[New-bugs-announce] [issue36169] Add overlap() method to statistics.NormalDist()

Raymond Hettinger report at bugs.python.org
Sat Mar 2 18:04:32 EST 2019


New submission from Raymond Hettinger <raymond.hettinger at gmail.com>:

------ How to use it ------

What percentage of men and women will have the same height in two normally distributed populations with known means and standard deviations?

    # http://www.usablestats.com/lessons/normal
    >>> men = NormalDist(70, 4)
    >>> women = NormalDist(65, 3.5)
    >>> men.overlap(women)
    0.5028719270195425

The result can be confirmed empirically with a Monte Carlo simulation:

    >>> from collections import Counter
    >>> n = 100_000
    >>> overlap = Counter(map(round, men.samples(n))) & Counter(map(round, women.samples(n)))
    >>> sum(overlap.values()) / n
    0.50349

The result can also be confirmed by numeric integration of the probability density function:

    >>> dx = 0.10
    >>> heights = [h * dx for h in range(500, 860)]
    >>> sum(min(men.pdf(h), women.pdf(h)) for h in heights) * dx
    0.5028920586287203

------ Code ------

    def overlap(self, other):
        '''Compute the overlap coefficient (OVL) between two normal distributions.

        Measures the agreement between two normal probability distributions.
        Returns a value between 0.0 and 1.0 giving the overlapping area in
        the two underlying probability density functions.

        '''

        # See: "The overlapping coefficient as a measure of agreement between
        # probability distributions and point estimation of the overlap of two
        # normal densities" -- Henry F. Inman and Edwin L. Bradley Jr
        # http://dx.doi.org/10.1080/03610928908830127

        # Also see:
        # http://www.iceaaonline.com/ready/wp-content/uploads/2014/06/MM-9-Presentation-Meet-the-Overlapping-Coefficient-A-Measure-for-Elevator-Speeches.pdf

        if not isinstance(other, NormalDist):
            return NotImplemented
        X, Y = self, other
        X_var, Y_var = X.variance, Y.variance
        if not X_var or not Y_var:
            raise StatisticsError('overlap() not defined when sigma is zero')
        dv = Y_var - X_var
        if not dv:
            return 2.0 * NormalDist(fabs(Y.mu - X.mu), 2.0 * X.sigma).cdf(0)
        a = X.mu * Y_var - Y.mu * X_var
        b = X.sigma * Y.sigma * sqrt((X.mu - Y.mu)**2 + dv * log(Y_var / X_var))
        x1 = (a + b) / dv
        x2 = (a - b) / dv
        return 1.0 - (fabs(Y.cdf(x1) - X.cdf(x1)) + fabs(Y.cdf(x2) - X.cdf(x2)))

---- Future ----

The concept of an overlap coefficient (OVL) is not specific to normal distributions, so it is possible to extend this idea to work with other distributions if needed.

----------
components: Library (Lib)
messages: 337020
nosy: davin, mark.dickinson, rhettinger, steven.daprano, tim.peters
priority: normal
severity: normal
status: open
title: Add overlap() method to statistics.NormalDist()
versions: Python 3.8

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36169>
_______________________________________


More information about the New-bugs-announce mailing list