[Numpy-discussion] scipy.stats.qqplot and scipy.stats.probplot axis labeling

Ralf Gommers ralf.gommers at gmail.com
Sat Jun 11 08:53:13 EDT 2016


Hi Mark,

Note that the scipy-dev or scipy-user mailing list would have been more
appropriate for this question.


On Fri, Jun 10, 2016 at 9:06 AM, Mark Gawron <gawron at mail.sdsu.edu> wrote:

>
>
> The scipy.stats.qqplot and scipy.stats.probplot  functions plot expected
> values versus actual data values for visualization of fit to a
> distribution.  First a one-D array of expected percentiles is generated for
>  a sample of size N; then that is passed to  dist.ppf, the per cent point
> function for the chosen distribution, to return an array of expected
> values.  The visualized data points are pairs of expected and actual
> values, and a linear regression is done on these to produce the line data
> points in this distribution should lie on.
>
> Where x is the input data array and dist the chosen distribution we have:
>
> osr = np.sort(x)
> osm_uniform = _calc_uniform_order_statistic_medians(len(x))
> osm = dist.ppf(osm_uniform)
> slope, intercept, r, prob, sterrest = stats.linregress(osm, osr)
>
>
> My question concerns the plot display.
>
> ax.plot(osm, osr, 'bo', osm, slope*osm + intercept, 'r-')
>
>
> The x-axis of the resulting plot is labeled quantiles, but the xticks and
> xticklabels produced produced by qqplot and problplot do not seem correct
> for the their intended interpretations.  First the numbers on the x-axis do
> not represent quantiles; the intervals between them do not in general
> contain equal numbers of points.  For a normal distribution with sigma=1,
> they represent standard deviations.  Changing the label on the x-axis does
> not seem like a very good solution, because the interpretation of the
> values on the x-axis will be different for different distributions.  Rather
> the right solution seems to be to actually show quantiles on the x-axis.
> The numbers on the x-axis can stay as they are, representing quantile
> indexes, but they need to be spaced so as to show the actual division
> points that carve the population up into  groups of the same size.  This
> can be done in something like the following way.
>

The ticks are correct I think, but they're theoretical quantiles and not
sample quantiles. This was discussed in [1] and is consistent with R [2]
and statsmodels [3]. I see that we just forgot to add "theoretical" to the
x-axis label (mea culpa). Does adding that resolve your concern?

[1] https://github.com/scipy/scipy/issues/1821
[2] http://data.library.virginia.edu/understanding-q-q-plots/
[3]
http://statsmodels.sourceforge.net/devel/generated/statsmodels.graphics.gofplots.qqplot.html?highlight=qqplot#statsmodels.graphics.gofplots.qqplot

Ralf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20160611/5c7ef4c3/attachment.html>


More information about the NumPy-Discussion mailing list