<br><div class="gmail_quote">On Mon, Jun 13, 2011 at 8:09 AM, Chris Angelico <span dir="ltr"><<a href="mailto:rosuav@gmail.com">rosuav@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div class="im">On Tue, Jun 14, 2011 at 12:58 AM, Zachary Dziura <<a href="mailto:zcdziura@gmail.com">zcdziura@gmail.com</a>> wrote:<br>
> if set(source_headers) == set(target_headers):<br>
> similar_headers = len(source_headers)<br>
<br>
</div>Since you're making sets already, I'd recommend using set operations -<br>
same_headers is the (length of the) intersection of those two sets,<br>
and different_headers is the XOR.<br>
<br>
# If you need the lists afterwards, use different variable names<br>
source_headers = set(source_headers)<br>
target_headers = set(target_headers)<br>
similar_headers = len(source_headers & target_headers)<br>
different_headers = len(source_headers ^ target_headers)<font color="#888888"><br></font></blockquote><div><br>This is a beautiful solution, and yet I feel compelled to mention that it disregards duplicates within a given list. If you need duplicate detection/differencing, it's better to sort each list and then use an algorithm similar to the merge step of mergesort.<br>
<br>Using sets as above is O(n), while the sorting version is O(nlogn) usually. O(n) is better than O(nlogn).<br><br>And of course, the version based on sorting assumes order doesn't matter.<br><br></div></div>