[Python-ideas] exclusively1, common, exclusively2 = set1 - set2, set1 & set2, set2 - set1

Thu Jul 4 07:22:36 CEST 2013

(Thanks Steven for the welcome back. I've been advocating Python through 
the Rosetta Code site).

I couldn't think of Oscar and your solution but now that we have a Python 
algorithm, I understand that it could only be really tested if there were 
an implementation in C to test against the incumbent method.

To give a bit more info on how much it is used: I have even incorporated 
the partition-type functionality in Python in throw-away bash  shell 
scripts of the form:

munge.sh data1 | sort > results1.txt
munge.sh data2 | sort > results2 .txt
python -c '

set1 = set(file("results1.txt"))

set2 = set(file("results2.txt"))

print([len(p) for p in (set1 - set2, set1 & set2, set2 - set1)])

'

I have used the above several times when my boss touches base with me and 
asks the informal "hi, and how are you getting on"?

The above isn't my only use. I have used the construct enough times for me 
to think of it as a "design pattern".

Before I posted, I looked up set partition on Wikipedia. It has a different 
meaning for a partition of a set, but doesn't apply any meaning to the 
partition of two sets leaving the dyadic meaning of partition vacant :-)

- Paddy.

On Thursday, 4 July 2013 03:10:28 UTC+1, Steven D'Aprano wrote:
>
> Hi Paddy, long time since I've read something from you, welcome back. 
>
> On 04/07/13 06:50, Paddy3118 wrote: 
> > I found myself repeating something that I know I have used before, 
> several 
> > times: I get two sets of results, may be sets of the passing tests when 
> a 
> > design has changed, and I need to work out what has changed so work out 
> > 
> > 1. What passed first time round 
> > 2. What passed both times. 
> > 3. What passed only the second time round. 
> > 
> > I usually use something like the set equations in the title to do this 
> but 
> > I recognise that this requires both sets to be traversed at least three 
> > times which seems wasteful. 
> > 
> > I wondered if their was am algorithm to partition the two sets of data 
> into 
> > three as above, but cutting down on the number of set traversals? 
>
> You could go from: 
>
> exclusively1 = set1 - set2 
> common = set1 & set2 
> exclusively2 = set2 - set1 
>
> (which I expect ends up traversing each set twice) to this: 
>
>
> common, exclusively1, exclusively2 = set(), set(), set() 
> for item in set1: 
>      if item in set2: common.add(item) 
>      else: exclusively1.add(item) 
> for item in set2: 
>      if item in set1: common.add(item) 
>      else: exclusively2.add(item) 
>
>
> which only does two set traversals. But I would expect that using the set 
> operators would be faster. (Iterating over the sets 3 or 4 times in C is 
> likely to be faster than iterating over them 2 times in pure Python.) 
>
>
> > I also wondered that if such an algorithm existed, would it be useful 
> > enough to be worth incorporating into the Python library? 
> > 
> > Maybe defined as: 
> > 
> > exclusively1, common, exclusively2 = set1.partition(set2) 
>
> [bikeshed] 
> I would expect common, only1, only2 in that order. 
>
>
> Does any other language with sets offer this as a set primitive? I too 
> have needed it often enough that I'd be +1 on adding it. 
>
>
>
> -- 
> Steven 
> _______________________________________________ 
> Python-ideas mailing list 
> Python... at python.org <javascript:> 
> http://mail.python.org/mailman/listinfo/python-ideas 
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130703/a815bd7f/attachment-0001.html>