[Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1

Sergey sergemp at mail.ru
Thu Jul 4 14:33:35 CEST 2013


On Jul 4, 2013 Oscar Benjamin wrote:

>> I usually use something like the set equations in the title to do this but I
>> recognise that this requires both sets to be traversed at least three times
>> which seems wasteful.
>>
>> I wondered if their was am algorithm to partition the two sets of data into
>> three as above, but cutting down on the number of set traversals?
>
> You can do it in one traversal of each set:
>
> def partition(setx, sety):
>     xonly, xandy, yonly = set(), set(), set()
>     for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]:
>         for val in set1:
>             if val in set2:
>                 xandy.add(val)
>             else:
>                 setn.add(val)
>     return xonly, xandy, yonly

JFYI, despite using two passes this long partition() is twice slower
than simple:
    def partition(set1, set2):
        common = set1 & set2
        return set1 - common, common, set2 - common

That's because both functions are O(n) but the short one is just
a few native calls, while long one uses lots of small calls, and
thus overhead of so many function calls *significantly* exceeds
time of one additional pass.

Simple is better than complex. Readability counts. :)

-- 

$ ./partition-test.py
2-lines partition: 0.5874 seconds
8-lines partition: 1.4138 seconds
9-lines partition: 0.8485 seconds


#partition-test.py
from time import time

def partition1(set1, set2):
    common = set1 & set2
    return set1 - common, common, set2 - common

def partition2(setx, sety):
    xonly, xandy, yonly = set(), set(), set()
    for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]:
        for val in set1:
            if val in set2:
                xandy.add(val)
            else:
                setn.add(val)
    return xonly, xandy, yonly

def partition3(old, new):
    removed = old - new
    common = set()
    added = set()
    for item in new:
        if item in old:
            common.add(item)
        else:
            added.add(item)
    return removed, common, added

if __name__ == "__main__":
    set1 = set(range(0,      3000000))
    set2 = set(range(2000000,4000000))
    t = time()
    partition1(set1, set2)
    print "2-lines partition:", time()-t, "seconds"
    t = time()
    partition2(set1, set2)
    print "8-lines partition:", time()-t, "seconds"
    t = time()
    partition3(set1, set2)
    print "9-lines partition:", time()-t, "seconds"


More information about the Python-ideas mailing list