[Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1

Thu Jul 4 16:15:28 CEST 2013

On 4 July 2013 13:33, Sergey <sergemp at mail.ru> wrote:
> On Jul 4, 2013 Oscar Benjamin wrote:

You've missed the attribution line for the OP here so I'll add it back:

>> Paddy3118 wrote:
>>> I usually use something like the set equations in the title to do this but I
>>> recognise that this requires both sets to be traversed at least three times
>>> which seems wasteful.
>>>

Note the question I'm responding to below.

>>> I wondered if their was am algorithm to partition the two sets of data into
>>> three as above, but cutting down on the number of set traversals?
>>
>> You can do it in one traversal of each set:
>>
>> def partition(setx, sety):
>>     xonly, xandy, yonly = set(), set(), set()
>>     for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]:
>>         for val in set1:
>>             if val in set2:
>>                 xandy.add(val)
>>             else:
>>                 setn.add(val)
>>     return xonly, xandy, yonly
>
> JFYI, despite using two passes this long partition() is twice slower
> than simple:
>     def partition(set1, set2):
>         common = set1 & set2
>         return set1 - common, common, set2 - common
>
> That's because both functions are O(n) but the short one is just
> a few native calls, while long one uses lots of small calls, and
> thus overhead of so many function calls *significantly* exceeds
> time of one additional pass.
>
> Simple is better than complex. Readability counts. :)

Since two or three people seem to have misunderstood what I wrote I'll
clarify that the intention was to demonstrate the existence of a
single pass algorithm. I was not trying to show the fastest CPython
code for this problem.

As Peter Otten has pointed out the algorithm is a little redundant so
I'll improve it:

def partition(setx, sety):
    xonly, xandy, yonly = set(), set(), set()
    for x in setx:
        if x in sety:
            xandy.add(x)
        else:
            xonly.add(x)
    for y in sety:
        if y not in setx:
            yonly.add(y)
    return xonly, xandy, yonly

If you're worried about optimal branch flow for large sets then you
might consider swapping so that the first loop is over the smaller of
the two sets (assuming that the for/if branch is slightly faster than
the for/else branch). However the swapping difference is not really an
algorithmic difference in the sense that it would be for computing
only the intersection.

Oscar