Could you use a set of tuples? <br><br>>>> set([(1,2),(1,3),(1,2),(2,3)])<br>set([(1, 2), (1, 3), (2, 3)])<br><br>Matt<br><br><div><span class="gmail_quote">On 7/19/07, <b class="gmail_sendername">Alex Mont</b> <


<a href="mailto:t-alexm@windows.microsoft.com">t-alexm@windows.microsoft.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">


<div link="blue" vlink="purple" lang="EN-US">


<div>


<p>I have a 2-dimensional Numeric array with the shape (2,N)


and I want to remove all duplicate rows from the array. For example if I start


out with:</p>


<p>[[1,2],</p>


<p>[1,3],</p>


<p>[1,2],</p>


<p>[2,3]]</p>


<p> </p>


<p>I want to end up with</p>


<p>[[1,2],</p>


<p>[1,3],</p>


<p>[2,3]].</p>


<p> </p>


<p>(Order of the rows doesn't matter, although order of


the two elements in each row does.)</p>


<p> </p>


<p>The problem is that I can't find any way of doing this


that is efficient with large data sets (in the data set I am using, N > 1000000)</p>


<p>The normal method of removing duplicates by putting the


elements into a dictionary and then reading off the keys doesn't work


directly because the keys – rows of Python arrays – aren't


hashable.</p>


<p>The best I have been able to do so far is:</p>


<p> </p>


<p>def remove_duplicates(x):</p>


<p>                d


= {}</p>


<p>                for


(a,b) in x:</p>


<p>                                d[(a,b)]


= (a,b)</p>


<p>                return


array(x.values())</p>


<p> </p>


<p>According to the profiler the loop takes about 7 seconds and


the call to array() 10 seconds with N=1,700,000.</p>


<p> </p>


<p>Is there a faster way to do this using Numeric?</p>


<p> </p>


<p>-Alex Mont</p>


</div>


</div>


<br>--<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br></blockquote></div>


<br>