[SciPy-User] Scipy's probplot compared to R's qqplot
PHobson at Geosyntec.com
PHobson at Geosyntec.com
Wed Mar 3 15:15:05 EST 2010
> On Wed, Mar 3, 2010 at 13:09, <PHobson at geosyntec.com> wrote:
> > 2) R and Scipy compute the quantiles of a dataset in slightly different
> > manners (??)
> >
> > Any clue as to why the discrepancy in #2 occurs?
> From: scipy-user-bounces at scipy.org [mailto:scipy-user-bounces at scipy.org]
> On Behalf Of Robert Kern
> There are several, slightly different but mostly reasonable ways of
> computing quantiles.
Good to know. Thanks.
> > Would you consider it a big deal?
> Probably not, but I'm happy to entertain arguments to the contrary if
> you would care to explain how R is computing the quantiles.
After reading Josef's response, I'm only concerned with the min/max values. R, like you, mentions that several methods are available to compute quantiles. By default, it uses what it calls Type 7:
"""
All sample quantiles are defined as weighted averages of
consecutive order statistics. Sample quantiles of type i are
defined by:
Q[i](p) = (1 - gamma) x[j] + gamma x[j+1],
where 1 <= i <= 9, (j-m)/n <= p < (j-m+1)/n, x[j] is the jth order
statistic, n is the sample size, the value of gamma is a function
of j = floor(np + m) and g = np + m - j, and m is a constant
determined by the sample quantile type.
[snip]
*Continuous sample quantile types 4 through 9*
For types 4 through 9, Q[i](p) is a continuous function of p, with
gamma = g and m given below. The sample quantiles can be obtained
equivalently by linear interpolation between the points
(p[k],x[k]) where x[k] is the kth order statistic. Specific
expressions for p[k] are given below.
[snip]
Type 7 m = 1-p. p[k] = (k - 1) / (n - 1). In this case, p[k] =
mode[F(x[k])]. This is used by S.
"""
I haven't really had time to look into Scipy's method yet. I'll dig a little deeper as soon as I get the chance.
Thanks for the insight,
-Paul H.
More information about the SciPy-User
mailing list