How to populate all possible hierarchical clusterings from a set of elements?
Alain Ketterlin
alain at dpt-info.u-strasbg.fr
Thu Jan 13 10:59:06 EST 2011
Richard Thomas <chardster at gmail.com> writes:
> On Jan 13, 10:02 am, Alain Ketterlin <al... at dpt-info.u-strasbg.fr>
>> def clusterings(l):
>> if len(l) == 1:
>> print repr(l)
>> else:
>> n = len(l)
>> for i in xrange(n):
>> for j in xrange(i+1,n):
>> clusterings(l[:i]+l[i+1:j]+l[j+1:]+[[l[i],l[j]]])
>> [...] there are *many* such clusterings? (the exact number should be
>> (n!)*((n-1)!)/2^(n-1) given the code above, if I'm not mistaken.)
> Actually the number of such "clusterings" is the (n-1)th Catalan
> number.
>
> http://en.wikipedia.org/wiki/Catalan_numbers
I don't think Catalan numbers exactly captures this number. As far as I
remember (and wikipedia seems to confirm this), Cn is the number of ways
you can repeatedly apply a binary operator to a sequence of objects,
where sequence means that the objects are ordered, which is not the case
here. To use wikipedia's example, C3 is 5 because you can do:
((ab)c)d (a(bc))d (ab)(cd) a((bc)d) a(b(cd))
If we list clusterings we can also have:
((ac)b)d ((ac)d)b (ac)(bd) ...
Actually, for each of the 5 "catalan expressions" above, you have 4!
valid permutations of the objects (leading to a complete count of
n!*C(n-1)). But this leads to many "duplicates", because (ab)(cd) and
(cd)(ab) are considered the same.
I just realize that the code I've given above also produces duplicates
(in particular, the example I've just used). At least, my counting was
correct w.r.t. the code :-) The plot thickens...
-- Alain.
More information about the Python-list
mailing list