[SciPy-dev] Possible Error in Kendall's Tau (scipy.stats.stats.kendalltau)
josef.pktd at gmail.com
josef.pktd at gmail.com
Thu Mar 19 14:04:17 EDT 2009
On Thu, Mar 19, 2009 at 1:13 PM, Sturla Molden <sturla at molden.no> wrote:
> On 3/19/2009 6:03 PM, josef.pktd at gmail.com wrote:
>
>> I only checked your python version, and all numbers agree with mine
>> and so can be considered as verified with spss and R (for tau-b).
>
> > I will open an enhancement ticket for your contingency table version
> > and the extension of the current stats.kendalltau.
>
> You don't have to. I just did (ticket #893).
>
> I have added Cython versions of kendalltau and kendalltau_fromct. They
> are more than ten times faster than the Python version I posted.
>
> It needs review and decision. I get numbers that look correct to me.
>
> http://projects.scipy.org/scipy/ticket/893
>
Thank you, I will look at it. Since we needed the conversion between
flat and 2d contingency table, I cleaned it up a bit. If you find it
useful, we could also include it, it handles only 2d tables, not
higher dimensional, but allows for arbitrary category labels. (I
reversed x and y from your original version of table2flat.
Josef
def flat2table(x, y, count):
'''convert flat contingency table to 2d
Parameters
----------
x, y: 1d arrays
flattened categories of observation
x is row variable
y is column variable
count: 1d array of cells content
Returns
-------
---------
ctable: 2d array,
contingency table
xcat, ycat: 1d arrays
labels of x and y categories
Examples
--------
>>> x = np.array([1,1,1,2,2,2])
>>> y = np.array([1,2,3,1,2,3])
>>> count = np.array([10, 5, 2, 9, 12, 16])
>>> flat2table(x, y, count)
[0 0 0 1 1 1]
[0 1 2 0 1 2]
(array([[ 10., 5., 2.],
[ 9., 12., 16.]]), array([1, 2]), array([1, 2, 3]))
>>> flat2table(x.astype(str), y.astype(str), count)
[0 0 0 1 1 1]
[0 1 2 0 1 2]
(array([[ 10., 5., 2.],
[ 9., 12., 16.]]), array(['1', '2'],
dtype='|S1'), array(['1', '2', '3'],
dtype='|S1'))
'''
catx, xrinv = np.unique1d(x, return_inverse=True)
caty, yrinv = np.unique1d(y, return_inverse=True)
ncatx = len(catx)
ncaty = len(caty)
tab = np.zeros((ncatx, ncaty))
tab[xrinv,yrinv] = count
return tab, catx, caty
def table2flat(ctable, xcat=None, ycat=None):
'''convert contingency table to flat format
Parmeters
---------
ctable: 2d array,
contingency table
xcat, ycat: 1d arrays
labels of x and y categories
Returns
-------
x, y: 1d arrays
flattened categories of observation
x is row variable
y is column variable
count: 1d array of cells content
Examples
--------
>>> tab = np.array([[ 10., 5., 2.], [ 9., 12., 16.]])
>>> table2flat(tab)
(array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2]), array([
10., 5., 2., 9., 12., 16.]))
>>> table2flat(tab,np.arange(1,nx+1).astype(str),
np.arange(1,ny+1).astype(str))
(array(['1', '1', '1', '2', '2', '2'],
dtype='|S1'), array(['1', '2', '3', '1', '2', '3'],
dtype='|S1'), array([ 10., 5., 2., 9., 12., 16.]))
'''
assert(ctable.ndim == 2)
nx = ctable.shape[0]
ny = ctable.shape[1]
if xcat is None:
xcat = range(nx)
if ycat is None:
ycat = range(ny)
y, x = np.meshgrid(ycat, xcat)
x = x.flatten()
y = y.flatten()
count = ctable.flatten()
return x, y, count
# example and round trip test
x = np.array([1,1,1,2,2,2])
y = np.array([1,2,3,1,2,3])
count = np.array([10, 5, 2, 9, 12, 16])
tab = flat2table(x, y, count)[0]
nx, ny = tab.shape
assert_array_equal((x-1,y-1,count),table2flat(flat2table(x, y, count)[0]))
assert_array_equal((x,y,count),
table2flat(flat2table(x, y, count)[0],range(1,nx+1), range(1,ny+1)))
More information about the SciPy-Dev
mailing list