basic statistics in python

Sun Mar 17 04:38:56 EST 2002

Tim Churches wrote:

> > delivers:
> >
> >    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
> >   0.230   1.226   7.300  18.960  31.680  78.900
> >
> > Everything is correct, except the 1st quantile and 3rd quantile.
> 
> You mean 1st quartile and 3rd quartile, not quantile. And the values
> calculated by R are not wrong, just different (see below).

The book "Statistical Methods in the Atmospheric Sciences", by D.S.
Wilks, does not really make a difference between "quantiles" and
"quartiles". According to the book I got the impression that quartiles
is inferior to quantiles (e.g. page 24: "Example 3.1. Computation of
Common Quantiles".

But you are right that I should be more precise in order to avoid
confusion.

> There are a number of methods for calculating quantiles. In R, the
> summary() function calls the quantile() function to calculate the 1st
> and 3rd quartiles and the median. The quantile() function uses linear
> interpolation to calculate the sample quantile for the probabilities of
> 0.25 and 0.75, whereas XLispStat is just taking the arithmetic mean of
> the 2nd and 3rd, and 6th and 7th values respectively (using zero-based
> indexing/counting, since this is the Python list).#

My first guess was also that R just calculates the quantiles in a
different fashion; but I could not find any hints in the documentation.
According to the beforementioned book (page 23):

"Almost as commonly used as the median are the quartiles, q0.25 and
q0.75. Usually these are called the lower and upper quartiles,
respectively. They are located halfway between the median, q0.5, and the
extremes, x(1) and x(n). In typically colorful terminology, Tukey (1977)
calls q0.25 and q0.75 the 'hinges', imagining that the data set has been
folded first at the median, and the quartiles."

I simply thought (and note the word "halfway" in the citation) then
XLispStat is/was correct.

> The methods used by R are fully described in the R manual (see
> help(quantile)), but a commonsense explanation of the R approach is as
> follows (again using zero-based indexing/counting). 

Maybe I did look too superficialy after the method of calculation.

Regards and especially thank you for your insight,
S. Gonzi