Algorithm for Creating Supersets of Smaller Sets Based on Common Elements
duncan smith
buzzard at invalid.invalid
Sun Feb 22 12:02:10 EST 2015
On 21/02/15 19:46, TommyVee wrote:
> Start off with sets of elements as follows:
>
> 1. A,B,E,F
> 2. G,H,L,P,Q
> 3. C,D,E,F
> 4. E,X,Z
> 5. L,M,R
> 6. O,M,Y
>
> Note that sets 1, 3 and 4 all have the element 'E' in common, therefore
> they are "related" and form the following superset:
>
> A,B,C,D,E,F,X,Z
>
> Likewise, sets 2 and 5 have the element 'L' in common, then set 5 and 6
> have element 'M' in common, therefore they form the following superset:
>
> G,H,L,M,O,P,Q,R,Y
>
> I think you get the point. As long as sets have at least 1 common
> element, they combine to form a superset. Also "links" (common
> elements) between sets may go down multiple levels, as described in the
> second case above (2->5->6). Cycles thankfully, are not possible.
>
> BTW, the number of individual sets (and resultant supersets) will be
> very large.
>
> I don't know where to start with this. I thought about some type of
> recursive algorithm, but I'm not sure. I could figure out the Python
> implementation easy enough, I'm just stumped on the algorithm itself.
>
> Anybody have an idea?
>
> Thanks, Tom
Tom,
You could construct a graph G such that there exists an edge {v,w}
for each pair of items that appear in a common set. Each connected
component will contain the nodes of one of the supersets you're looking
for. That's unnecessarily expensive, so you can adopt a similar approach
using trees.
http://en.wikipedia.org/wiki/Disjoint-set_data_structure
Construct a forest of trees (initially one tree for each item) and join
pairs of trees until you have one tree for each of your supersets. For
each set of size n you only need to consider n-1 joins. That will ensure
that any pair of items that are in one of the sets are contained in a
single tree. The find and union operations are amortized constant, so
this should be efficient for your large numbers of sets.
The union_find module can be found at
https://github.com/DuncanSmith147/KVMS. Cheers.
Duncan
>>> import union_find
>>> sets = (['A','B','E','F'],
['G','H','L','P','Q'],
['C','D','E','F'],
['E','X','Z'],
['L','M','R'],
['O','M','Y'])
>>> uf = union_find.UnionFindTree()
>>> for a_set in sets:
for item in a_set:
try:
uf.add(item)
except ValueError:
pass
n = a_set[0]
for item in a_set[1:]:
is_merged = uf.union(n, item)
>>> from collections import defaultdict
>>> res = defaultdict(list)
>>> for item in uf:
res[uf.find(item)].append(item)
>>> res.values()
[['G', 'H', 'M', 'L', 'O', 'Q', 'P', 'R', 'Y'], ['A', 'C', 'B', 'E',
'D', 'F', 'X', 'Z']]
>>>
More information about the Python-list
mailing list