Could you use a set of tuples? <br><br>>>> set([(1,2),(1,3),(1,2),(2,3)])<br>set([(1, 2), (1, 3), (2, 3)])<br><br>Matt<br><br><div><span class="gmail_quote">On 7/19/07, <b class="gmail_sendername">Alex Mont</b> <
<a href="mailto:t-alexm@windows.microsoft.com">t-alexm@windows.microsoft.com</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p>I have a 2-dimensional Numeric array with the shape (2,N)
and I want to remove all duplicate rows from the array. For example if I start
out with:</p>
<p>[[1,2],</p>
<p>[1,3],</p>
<p>[1,2],</p>
<p>[2,3]]</p>
<p> </p>
<p>I want to end up with</p>
<p>[[1,2],</p>
<p>[1,3],</p>
<p>[2,3]].</p>
<p> </p>
<p>(Order of the rows doesn't matter, although order of
the two elements in each row does.)</p>
<p> </p>
<p>The problem is that I can't find any way of doing this
that is efficient with large data sets (in the data set I am using, N > 1000000)</p>
<p>The normal method of removing duplicates by putting the
elements into a dictionary and then reading off the keys doesn't work
directly because the keys – rows of Python arrays – aren't
hashable.</p>
<p>The best I have been able to do so far is:</p>
<p> </p>
<p>def remove_duplicates(x):</p>
<p> d
= {}</p>
<p> for
(a,b) in x:</p>
<p> d[(a,b)]
= (a,b)</p>
<p> return
array(x.values())</p>
<p> </p>
<p>According to the profiler the loop takes about 7 seconds and
the call to array() 10 seconds with N=1,700,000.</p>
<p> </p>
<p>Is there a faster way to do this using Numeric?</p>
<p> </p>
<p>-Alex Mont</p>
</div>
</div>
<br>--<br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a><br></blockquote></div>
<br>