Hi Sergey, I thought that set partitioning might be a worthy candidate for implementation in C if an algorithm that cut down on the set traversals could be found. Yep, I didn't think such an algorithm in Python would likely be any quicker. I guess there are two parts to my question: 1. Does an algorithm using less passes exist? Yes, thanks. 2. Would a C implementation be a worthwhile addition to Python? - Ongoing... - Paddy. On Thursday, 4 July 2013 13:33:35 UTC+1, Sergey wrote:
On Jul 4, 2013 Oscar Benjamin wrote:
I usually use something like the set equations in the title to do this but I recognise that this requires both sets to be traversed at least three times which seems wasteful.
I wondered if their was am algorithm to partition the two sets of data into three as above, but cutting down on the number of set traversals?
You can do it in one traversal of each set:
def partition(setx, sety): xonly, xandy, yonly = set(), set(), set() for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]: for val in set1: if val in set2: xandy.add(val) else: setn.add(val) return xonly, xandy, yonly
JFYI, despite using two passes this long partition() is twice slower than simple: def partition(set1, set2): common = set1 & set2 return set1 - common, common, set2 - common
That's because both functions are O(n) but the short one is just a few native calls, while long one uses lots of small calls, and thus overhead of so many function calls *significantly* exceeds time of one additional pass.
Simple is better than complex. Readability counts. :)