[Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1
Sergey
sergemp at mail.ru
Thu Jul 4 14:33:35 CEST 2013
On Jul 4, 2013 Oscar Benjamin wrote:
>> I usually use something like the set equations in the title to do this but I
>> recognise that this requires both sets to be traversed at least three times
>> which seems wasteful.
>>
>> I wondered if their was am algorithm to partition the two sets of data into
>> three as above, but cutting down on the number of set traversals?
>
> You can do it in one traversal of each set:
>
> def partition(setx, sety):
> xonly, xandy, yonly = set(), set(), set()
> for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]:
> for val in set1:
> if val in set2:
> xandy.add(val)
> else:
> setn.add(val)
> return xonly, xandy, yonly
JFYI, despite using two passes this long partition() is twice slower
than simple:
def partition(set1, set2):
common = set1 & set2
return set1 - common, common, set2 - common
That's because both functions are O(n) but the short one is just
a few native calls, while long one uses lots of small calls, and
thus overhead of so many function calls *significantly* exceeds
time of one additional pass.
Simple is better than complex. Readability counts. :)
--
$ ./partition-test.py
2-lines partition: 0.5874 seconds
8-lines partition: 1.4138 seconds
9-lines partition: 0.8485 seconds
#partition-test.py
from time import time
def partition1(set1, set2):
common = set1 & set2
return set1 - common, common, set2 - common
def partition2(setx, sety):
xonly, xandy, yonly = set(), set(), set()
for set1, set2, setn in [(setx, sety, xonly), (sety, setx, yonly)]:
for val in set1:
if val in set2:
xandy.add(val)
else:
setn.add(val)
return xonly, xandy, yonly
def partition3(old, new):
removed = old - new
common = set()
added = set()
for item in new:
if item in old:
common.add(item)
else:
added.add(item)
return removed, common, added
if __name__ == "__main__":
set1 = set(range(0, 3000000))
set2 = set(range(2000000,4000000))
t = time()
partition1(set1, set2)
print "2-lines partition:", time()-t, "seconds"
t = time()
partition2(set1, set2)
print "8-lines partition:", time()-t, "seconds"
t = time()
partition3(set1, set2)
print "9-lines partition:", time()-t, "seconds"
More information about the Python-ideas
mailing list