[SciPy-User] Scipy's probplot compared to R's qqplot

Wed Mar 3 15:15:05 EST 2010

> On Wed, Mar 3, 2010 at 13:09,  <PHobson at geosyntec.com> wrote:
> > 2) R and Scipy compute the quantiles of a dataset in slightly different
> > manners (??)
> >
> > Any clue as to why the discrepancy in #2 occurs?

> From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org]
> On Behalf Of Robert Kern
> There are several, slightly different but mostly reasonable ways of
> computing quantiles.

Good to know. Thanks.

> > Would you consider it a big deal?

> Probably not, but I'm happy to entertain arguments to the contrary if
> you would care to explain how R is computing the quantiles.

After reading Josef's response, I'm only concerned with the min/max values. R, like you, mentions that several methods are available to compute quantiles. By default, it uses what it calls Type 7:
""" 
     All sample quantiles are defined as weighted averages of
     consecutive order statistics. Sample quantiles of type i are
     defined by:

                 Q[i](p) = (1 - gamma) x[j] + gamma x[j+1],             

     where 1 <= i <= 9, (j-m)/n <= p < (j-m+1)/n, x[j] is the jth order
     statistic, n is the sample size, the value of gamma is a function
     of j = floor(np + m) and g = np + m - j, and m is a constant
     determined by the sample quantile type.
[snip]
     *Continuous sample quantile types 4 through 9*

     For types 4 through 9, Q[i](p) is a continuous function of p, with
     gamma = g and m given below. The sample quantiles can be obtained
     equivalently by linear interpolation between the points
     (p[k],x[k]) where x[k] is the kth order statistic.  Specific
     expressions for p[k] are given below.
[snip]
     Type 7 m = 1-p.  p[k] = (k - 1) / (n - 1).  In this case, p[k] =
          mode[F(x[k])].  This is used by S.
"""

I haven't really had time to look into Scipy's method yet. I'll dig a little deeper as soon as I get the chance.

Thanks for the insight,
-Paul H.