Pairwise frequency count from an incidence matrix of group membership

Fri Apr 22 10:43:28 EDT 2011

On Apr 22, 6:57 am, Jean-Michel Pichavant <jeanmic... at sequans.com>
wrote:
> Shafique, M. (UNU-MERIT) wrote:
> > Hi,
> > I have a number of different groups g1, g2, … g100 in my data. Each
> > group is comprised of a known but different set of members (m1, m2,
> > …m1000) from the population. The data has been organized in an
> > incidence matrix:
> > g1 g2 g3 g4 g5
> > m1 1 1 1 0 1
> > m2 1 0 0 1 0
> > m3 0 1 1 0 0
> > m4 1 1 0 1 1
> > m5 0 0 1 1 0
>
> > I need to count how many groups each possible pair of members share
> > (i.e., both are member of).
> > I shall prefer the result in a pairwise edgelist with weight/frequency
> > in a format like the following:
> > m1, m1, 4
> > m1, m2, 1
> > m1, m3, 2
> > m1, m4, 3
> > m1, m5, 1
> > m2, m2, 2
> > ... and so on.
>
> > I shall highly appreciate if anybody could suggest/share some
> > code/tool/module which could help do this.
>
> > Best regards,
> > Muhammad
>
> Here are some clues
>
> m1 = [1,1,1,0,1]
> m2 = [1,0,0,1,0]
>
> def foo(list1, list2):
>       return len([ index for index, val in enumerate(list1) if val and
> list2[index]])
>
>  > foo(m1, m1)
> < 4
>
>  > foo(m1, m2)
> < 1
>
> JM- Hide quoted text -
>
> - Show quoted text -

He seems to have variables named m1,m2, etc. Would make more sense to
have an array m, but given the varables:

def count_matches(i,j):
	pairs = zip(eval("m"+str(i)),eval("m"+str(j)))
	return sum([x*y for x,y in pairs])

Then:

>>> m3 = [1,0,1,1]
>>> m4 = [1,0,0,1]
>>> count_matches(3,4)
>>> 2