[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

Wed Mar 18 09:29:17 EDT 2009

Almer S. Tigelaar wrote:
> Hello,
>
> On Wed, 2009-03-18 at 13:11 +0100, Sturla Molden wrote:
>   
>> So it seems Hollander & Wolfe and Minitab says 0.67, whereas Numerical 
>> Receipes says 1.0. Intuitively a vector correlation should be exactly 
>> correlated with itself, but I am inclined to trust Hollander & Wolfe 
>> more than Numerical Receipes.
>>     
>
> Ah, I was under the impression you already checked Hollander & Wolfe.
> Anyway, it seems my initial interpretation was right then. Repeating the
> formula here (augmented) for future reference:
>
> Kendall's tau-b (tie handling):
> -------------------------------
> Given two rankings R1 and R2, Kendall's tau-b is calculated by:
>         t = (P - Q) / SQRT((P + Q + T) * (P + Q + U))
> where P is the number of concordant pairs, Q the number of discordant
> pairs, T the number of ties in R1 and U the number of ties in R2.
> [Ties are always counted regardless of whether they occur for the same
> pair in R1 and R2 or different pairs]
> -------------------------------
>
> Some tests I ran today with the R implementation of Kendall's Tau(-a)
> and the original implementation in SciPy.stats.stats (Kendall's Tau-b)
> seem to suggests that if we do NOT count ties on the same pair (the
> current situation in SciPy.stats.stats) effectively Kendall's Tau-b
> gives the same outcomes as Kendall's Tau-a for about 36 test cases.
>
> This seems to suggest that Kendall's Tau-b (tie correction) in SciPy as
> it is behaves like Kendall's Tau-a (no tie correction), possibly because
> of leaving out ties on identical pairs in T and U above.
>
> I unfortunately do not have the time to mathematically prove (or
> disprove) the equivalence of Kendall's Tau-a and the current SciPy
> implementation right now, but I thought I'd be useful to mention these
> test results.
>
>   

Hi,
This link might be useful as it has worked examples:
http://faculty.chass.ncsu.edu/garson/PA765/assocordinal.htm

I find that the SAS documentation for Proc Freq very useful:
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/freq_index.htm
http://support.sas.com/onlinedoc/913/getDoc/en/statug.hlp/freq_sect20.htm

Also, does these implementation depend on the type of array?
It would be great to have a single function that accepts an array, 
masked array or an object that can be converted into an array.

Finally, this measure assumes ordinal data but there is no type checking 
done in the Scipy function.

Bruce