[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)
Sturla Molden
sturla at molden.no
Thu Mar 19 09:30:25 EDT 2009
So, it seems this version is faster and more memory efficient:
def kendalltau_fromct(table):
'''return tau-a, tau-b and tau-c from contingency table data
example for contingency table:
count = np.array([[10, 5, 2],
[ 9, 12, 16]])
'''
def _table(j,i):
return {
#'above' : table[:j,i],
'below' : table[j+1:,i],
#'left' : table[j,:i],
'right' : table[j,i+1:],
#'upper-left' : table[:j,:i],
#'upper-right' : table[:j,i+1:],
'lower-left' : table[j+1:,:i],
'lower-right' : table[j+1:,i+1:] }
D = table.shape[0]
C = table.shape[1]
P = 0
Q = 0
Tx = 0
Ty = 0
for i in range(C):
for j in range(D):
pivot = table[j,i] # use this as pivot
ct = _table(j,i) # split remainder into 8 sections
# count concordant pairs
# -- multiply pivot with 'lower-right' and summate
P += np.sum(ct['lower-right'] * pivot)
# count disconcordant pairs
# -- multiply pivot with 'lower-left' and summate
Q += np.sum(ct['lower-left'] * pivot)
# count pairs tied in y
# -- multiply pivot with 'right' and summate
Ty += np.sum(ct['right'] * pivot)
# count pairs tied in x
# -- multiply pivot with 'below' and summate
Tx += np.sum(ct['below'] * pivot)
n = np.sum(table)
tau_a = 2*float(P-Q)/(n*(n-1))
tau_b = float(P-Q)/(sqrt((P+Q+Tx)*(P+Q+Ty)))
m = C if C < D else D
tau_c = (P-Q)*(2*m/float((m-1)*n**2))
return tau_a, tau_b, tau_c
If Fortran 95 is difficult to build and maintain, we can do this in
Cython as well. There is no need for function calls with array arguments
here, so Fortan has no advantage over Cython in this case. Something
like this:
import numpy as np
fom math import sqrt
cimport numpy as np
ctypedef np.int_t int_t
cimport cython
@cython.boundscheck(False)
def kendalltau_fromct(np.ndarray[int_t, ndim=2] table):
'''return tau-a, tau-b and tau-c from contingency table data
example for contingency table:
count = np.array([[10, 5, 2],
[ 9, 12, 16]])
'''
cdef int C, D, P, Q, Tx, Ty
cdef int i, j, ii, jj, m, n
D = <int> table.shape[0]
C = <int> table.shape[1]
P = 0
Q = 0
Tx = 0
Ty = 0
n = 0
for j in range(D):
for i in range(C):
pivot = table[j,i] # use this as pivot
n = n + pivot
# count concordant pairs
# -- multiply pivot with 'lower-right' and summate
for jj in range(j+1,D):
for ii in range(i+1,C):
P = P + table[jj,ii] * pivot
# count disconcordant pairs
# -- multiply pivot with 'lower-left' and summate
for jj in range(j+1,D):
for ii in range(i):
Q = Q + table[jj,ii] * pivot
# count pairs tied in y
# -- multiply pivot with 'right' and summate
for ii in range(i+1,C):
Ty = Ty + table[j,ii] * pivot
# count pairs tied in x
# -- multiply pivot with 'below' and summate
for jj in range(j+1,D):
Tx = Tx + table[jj,i] * pivot
tau_a = 2*float(P-Q)/(n*(n-1))
tau_b = float(P-Q)/(sqrt(float((P+Q+Tx)*(P+Q+Ty))))
m = C if C < D else D
tau_c = (P-Q)*(2*m/float((m-1)*n**2))
return tau_a, tau_b, tau_c
Now I am done Kendall's tau for sure. Pick whichever you like.
Sturla Molden
More information about the SciPy-Dev
mailing list