I ported an algorithm originally written in MatLab to Python for generating
a random contingency table (when one exists) for a statistical test I am
writing. I know something like this doesn't exist in
scipy.stats.contingency and was wondering if it would be a good addition? I
haven't developed any tests for the function yet but can if needed.
I have just started developing in Python as an undergraduate student
studying computational biology. This would be my first time contributing to
open source and I am looking forward to the the experience and all the
advice you are willing to offer. Thanks for your time.
-Clint
def rcont(ncol, nrow, ncolt, nrowt):
"""
Return a random contingency table based on the two-dimensional shape
and marginal totals of a given contingency table
There must be 1 or many tables with nonnegative, integral entries
that have the given shape and marginal sums as input. The function
will generate, at random, one of the tables and return it. Repeated
calls to the function will return new random selections.
Parameters:
-----------
ncol : integer
The vector length of the columns in the contingency table
nrow : integer
The vector length of the rows in the contingency table
ncolt : array_like
The vector of the marginal column totals in the contingency table
nrowt : array_like
The vector of the marginal row totals in the contingency table
Returns
-------
imatrix : ndarray of integers
The randomly generated two-dimensional ndarray preserving the
same shape and marginal totals provided if exists
Reference
---------
Algorithm AS144: Appl. Statist. (1979) v.28, pp.329-332
Licensing:
GNU LGPL license
Author:
Original FORTRAN77 version by James Boyett.
MATLAB version by John Burkardt.
Python version by Clint Valentine
Reference:
James Boyett,
Algorithm AS 144:
Random R x C Tables with Given Row and Column Totals,
Applied Statistics,
Volume 28, Number 3, pages 329-332, 1979.
"""
# Initialize and permute vector
nvect = np.array(xrange(sum(ncolt)), dtype=np.int_)
np.random.shuffle(nvect)
# Construct random matrix and partial column sums
imatrix = np.zeros([nrow, ncol], dtype=np.int_)
nsubt = np.cumsum(ncolt)
# Generate random RxC contingency tables preserving shape and
# marginal column and row sums and total sum.
count = 0
for i in xrange(nrow):
limit = nrowt[i]
for _ in xrange(limit):
for j in xrange(ncol):
if nvect[count] <= nsubt[j]:
count += 1
imatrix[i][j] += 1
break
return imatrix