[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)

Thu Mar 19 09:30:25 EDT 2009

So, it seems this version is faster and more memory efficient:

def kendalltau_fromct(table):
     '''return tau-a, tau-b and tau-c from contingency table data

     example for contingency table:

     count = np.array([[10,  5,  2],
                        [ 9, 12, 16]])

     '''

     def _table(j,i):
       return {
          #'above' :       table[:j,i],
          'below' :       table[j+1:,i],
          #'left' :        table[j,:i],
          'right' :       table[j,i+1:],
          #'upper-left' :  table[:j,:i],
          #'upper-right' : table[:j,i+1:],
          'lower-left' :  table[j+1:,:i],
          'lower-right' : table[j+1:,i+1:] }

     D = table.shape[0]
     C = table.shape[1]
     P = 0
     Q = 0
     Tx = 0
     Ty = 0
     for i in range(C):
         for j in range(D):

             pivot = table[j,i] # use this as pivot
             ct = _table(j,i)   # split remainder into 8 sections

             # count concordant pairs
             # -- multiply pivot with 'lower-right' and summate
             P += np.sum(ct['lower-right'] * pivot)

             # count disconcordant pairs
             # -- multiply pivot with 'lower-left' and summate

             Q += np.sum(ct['lower-left'] * pivot)

             # count pairs tied in y
             # -- multiply pivot with 'right' and summate

             Ty += np.sum(ct['right'] * pivot)

             # count pairs tied in x
             # -- multiply pivot with 'below' and summate

             Tx += np.sum(ct['below'] * pivot)

     n = np.sum(table)
     tau_a = 2*float(P-Q)/(n*(n-1))
     tau_b = float(P-Q)/(sqrt((P+Q+Tx)*(P+Q+Ty)))
     m = C if C < D else D
     tau_c = (P-Q)*(2*m/float((m-1)*n**2))
     return tau_a, tau_b, tau_c

If Fortran 95 is difficult to build and maintain, we can do this in 
Cython as well. There is no need for function calls with array arguments 
here, so Fortan has no advantage over Cython in this case. Something 
like this:

import numpy as np
fom math import sqrt
cimport numpy as np
ctypedef np.int_t int_t
cimport cython

@cython.boundscheck(False)
def kendalltau_fromct(np.ndarray[int_t, ndim=2] table):
     '''return tau-a, tau-b and tau-c from contingency table data

     example for contingency table:

     count = np.array([[10,  5,  2],
                        [ 9, 12, 16]])

     '''

     cdef int C, D, P, Q, Tx, Ty
     cdef int i, j, ii, jj, m, n

     D = <int> table.shape[0]
     C = <int> table.shape[1]
     P = 0
     Q = 0
     Tx = 0
     Ty = 0
     n = 0
     for j in range(D):
         for i in range(C):

             pivot = table[j,i] # use this as pivot
             n = n + pivot

             # count concordant pairs
             # -- multiply pivot with 'lower-right' and summate
             for jj in range(j+1,D):
                 for ii in range(i+1,C):
                     P = P + table[jj,ii] * pivot

             # count disconcordant pairs
             # -- multiply pivot with 'lower-left' and summate
             for jj in range(j+1,D):
                 for ii in range(i):
                     Q = Q + table[jj,ii] * pivot

             # count pairs tied in y
             # -- multiply pivot with 'right' and summate
             for ii in range(i+1,C):
                 Ty = Ty + table[j,ii] * pivot

             # count pairs tied in x
             # -- multiply pivot with 'below' and summate
             for jj in range(j+1,D):
                 Tx = Tx + table[jj,i] * pivot

     tau_a = 2*float(P-Q)/(n*(n-1))
     tau_b = float(P-Q)/(sqrt(float((P+Q+Tx)*(P+Q+Ty))))
     m = C if C < D else D
     tau_c = (P-Q)*(2*m/float((m-1)*n**2))
     return tau_a, tau_b, tau_c

Now I am done Kendall's tau for sure. Pick whichever you like.

Sturla Molden