Pairwise count of frequency from an incidence matrix of group membership

Peter Otten __peter__ at web.de
Wed Apr 20 10:30:28 CEST 2011


Shafique, M. (UNU-MERIT) wrote:

> Hi,
> I have a number of different groups g1, g2, … g100 in my data. Each group
> is comprised of a known but different set of members from the population
> m1, m2, …m1000. The data has been organized in an incidence matrix:
> g1g2g3g4g5
> m111101
> m210010
> m301100
> m411011
> m500110
> 
> I need to count how many groups each possible pair of members share (i.e.,
> both are member of).
> I shall prefer the result in a pairwise edgelist with weight/frequency in
> a format like the following:
> m1, m1, 4
> m1, m2, 1
> m1, m3, 2
> m1, m4, 3
> m1, m5, 1
> m2, m2, 2
> ... and so on.
> 
> I shall highly appreciate if anybody could suggest/share some
> code/tool/module which could help do this.

Homework? What have you tried?

One strategy is to create a list of sets containing the groups from the 
initial matrix

matrix = [
[1, 1, 1, 0, 1],
[1, 0, 0, 1, 0],
]

sets = [ # zero-based indices
   set([0,1,2,4]),
   set([0,3]),
   ...
]

The enumerate() builtin may help you with the conversion. You can then find 
the shared groups with set arithmetic:

sets[0] & sets[1] #m1/m2





More information about the Python-list mailing list