[Numpy-discussion] New function `count_unique` to generate contingency tables.

Warren Weckesser warren.weckesser at gmail.com
Tue Aug 12 11:57:57 EDT 2014


On Tue, Aug 12, 2014 at 11:35 AM, Warren Weckesser <
warren.weckesser at gmail.com> wrote:

> I created a pull request (https://github.com/numpy/numpy/pull/4958) that
> defines the function `count_unique`.  `count_unique` generates a
> contingency table from a collection of sequences.  For example,
>
> In [7]: x = [1, 1, 1, 1, 2, 2, 2, 2, 2]
>
> In [8]: y = [3, 4, 3, 3, 3, 4, 5, 5, 5]
>
> In [9]: (xvals, yvals), counts = count_unique(x, y)
>
> In [10]: xvals
> Out[10]: array([1, 2])
>
> In [11]: yvals
> Out[11]: array([3, 4, 5])
>
> In [12]: counts
> Out[12]:
> array([[3, 1, 0],
>        [1, 1, 3]])
>
>
> It can be interpreted as a multi-argument generalization of `np.unique(x,
> return_counts=True)`.
>
> It overlaps with Pandas' `crosstab`, but I think this is a pretty
> fundamental counting operation that fits in numpy.
>
> Matlab's `crosstab` (http://www.mathworks.com/help/stats/crosstab.html)
> and R's `table` perform the same calculation (with a few more bells and
> whistles).
>
>
> For comparison, here's Pandas' `crosstab` (same `x` and `y` as above):
>
> In [28]: import pandas as pd
>
> In [29]: xs = pd.Series(x)
>
> In [30]: ys = pd.Series(y)
>
> In [31]: pd.crosstab(xs, ys)
> Out[31]:
> col_0  3  4  5
> row_0
> 1      3  1  0
> 2      1  1  3
>
>
> And here is R's `table`:
>
> > x <- c(1,1,1,1,2,2,2,2,2)
> > y <- c(3,4,3,3,3,4,5,5,5)
> > table(x, y)
>    y
> x   3 4 5
>   1 3 1 0
>   2 1 1 3
>
>
> Is there any interest in adding this (or some variation of it) to numpy?
>
>
> Warren
>
>

While searching StackOverflow in the numpy tag for "count unique", I just
discovered that I basically reinvented Eelco Hoogendoorn's code in his
answer to
http://stackoverflow.com/questions/10741346/numpy-frequency-counts-for-unique-values-in-an-array.
Nice one, Eelco!

Warren
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140812/6ce2f01b/attachment.html>


More information about the NumPy-Discussion mailing list