# [Numpy-discussion] New function `count_unique` to generate contingency tables.

Warren Weckesser warren.weckesser at gmail.com
Tue Aug 12 11:35:43 EDT 2014

```I created a pull request (https://github.com/numpy/numpy/pull/4958) that
defines the function `count_unique`.  `count_unique` generates a
contingency table from a collection of sequences.  For example,

In [7]: x = [1, 1, 1, 1, 2, 2, 2, 2, 2]

In [8]: y = [3, 4, 3, 3, 3, 4, 5, 5, 5]

In [9]: (xvals, yvals), counts = count_unique(x, y)

In [10]: xvals
Out[10]: array([1, 2])

In [11]: yvals
Out[11]: array([3, 4, 5])

In [12]: counts
Out[12]:
array([[3, 1, 0],
[1, 1, 3]])

It can be interpreted as a multi-argument generalization of `np.unique(x,
return_counts=True)`.

It overlaps with Pandas' `crosstab`, but I think this is a pretty
fundamental counting operation that fits in numpy.

Matlab's `crosstab` (http://www.mathworks.com/help/stats/crosstab.html) and
R's `table` perform the same calculation (with a few more bells and
whistles).

For comparison, here's Pandas' `crosstab` (same `x` and `y` as above):

In [28]: import pandas as pd

In [29]: xs = pd.Series(x)

In [30]: ys = pd.Series(y)

In [31]: pd.crosstab(xs, ys)
Out[31]:
col_0  3  4  5
row_0
1      3  1  0
2      1  1  3

And here is R's `table`:

> x <- c(1,1,1,1,2,2,2,2,2)
> y <- c(3,4,3,3,3,4,5,5,5)
> table(x, y)
y
x   3 4 5
1 3 1 0
2 1 1 3

Is there any interest in adding this (or some variation of it) to numpy?

Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140812/c913e930/attachment.html>
```