Thanks again. What you explain is reasonable. I try to the second method to unique the list. It does turn out that python just works and works without result. Maybe because it do iterate a long list in my example and slow. <div>
<br></div><div><div>>>> def average_polysemy(pos):</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>synset_list = list(wn.all_synsets(pos))</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>sense_number = 0</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>lemma_list = []</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>for synset in synset_list:</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>lemma_list.extend(synset.lemma_names)</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span><font color="#ff0000">unique_lemma_list = []</font></div><div><font color="#ff0000"><span class="Apple-tab-span" style="white-space:pre"> </span>for w in lemma_list:</font></div>
<div><font color="#ff0000"><span class="Apple-tab-span" style="white-space:pre"> </span>if not w in unique_lemma_list:</font></div><div><font color="#ff0000"><span class="Apple-tab-span" style="white-space:pre"> </span>unique_lemma_list.append(w)</font></div>
<div><font color="#ff0000"><span class="Apple-tab-span" style="white-space:pre"> </span>return unique_lemma_list</font></div><div><span class="Apple-tab-span" style="white-space:pre"> </span>for lemma in unique_lemma_list:</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>sense_number_new = len(wn.synsets(lemma, pos))</div><div><span class="Apple-tab-span" style="white-space:pre"> </span>sense_number = sense_number + sense_number_new</div>
<div><span class="Apple-tab-span" style="white-space:pre"> </span>return sense_number/len(unique_lemma_list)</div><div><br></div><div>>>> average_polysemy('n')</div><br><div class="gmail_quote">On Sun, Sep 9, 2012 at 2:36 PM, Donald Stufft <span dir="ltr"><<a href="mailto:donald.stufft@gmail.com" target="_blank">donald.stufft@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
For a short list the difference is going to be negligible.
</div><div><br></div><div>For a long list the difference is that checking if an item in a list requires iterating over the list internally to find it but checking if an item is inside of a set uses a faster method that doesn't require iterating over the list. This doesn't matter if you have 20 or 30 items, but imagine if instead you have 50 million items. Your going to be iterating over the list a lot and that can introduce significant slow dow.</div>
<div><br></div><div>On the other hand using a set is faster in that case, but because you are storing an additional copy of the data you are using more memory to store extra copies of everything.</div><div class="HOEnZb">
<div class="h5">
<div></div>
<p style="color:#a0a0a8">On Sunday, September 9, 2012 at 2:31 AM, John H. Li wrote:</p>
<blockquote type="cite" style="border-left-style:solid;border-width:1px;margin-left:0px;padding-left:10px">
<span><div><div>Thanks first, I could understand the second approach easily. The first approach is a bit puzzling. Why are seen=set() and seen.add(x) still necessary there if we can use unique.append(x) alone? Thanks for your enlightenment.<br>
<br><div>On Sun, Sep 9, 2012 at 1:59 PM, Donald Stufft <span dir="ltr"><<a href="mailto:donald.stufft@gmail.com" target="_blank">donald.stufft@gmail.com</a>></span> wrote:<br><blockquote type="cite"><div>
<div>
seen = set()
</div><div>uniqued = []</div><div>for x in original:</div><div> if not x in seen:</div><div> seen.add(x)</div><div> uniqued.append(x)</div><div><br></div><div>or</div><div><br></div><div>
uniqued = []</div><div>for x in oriignal:</div><div> if not x in uniqued:</div><div> uniqued.append(x)</div><div><br></div><div>The difference between is option #1 is more efficient speed wise, but uses more memory (extraneous set hanging around), whereas the second is slower (``in`` is slower in lists than in sets) but uses less memory.</div>
<div><div>
<p style="color:#a0a0a8">On Sunday, September 9, 2012 at 1:56 AM, John H. Li wrote:</p><blockquote type="cite"><div>
<span><div><div>Many thanks. If I want keep the order, how can I deal with it?<div>or we can list(set([1, 1, 2, 3, 4])) = [1,2,3,4]<br><div><br><br><div>On Sun, Sep 9, 2012 at 1:47 PM, Donald Stufft <span dir="ltr"><<a href="mailto:donald.stufft@gmail.com" target="_blank">donald.stufft@gmail.com</a>></span> wrote:<br>
<blockquote type="cite"><div>
<div>
If you don't need to retain order you can just use a set, </div><div><br></div><div>set([1, 1, 2, 3, 4]) = set([1, 2, 3, 4])</div><div><br></div><div>But set's don't retain order.</div>
<p style="color:#a0a0a8">On Sunday, September 9, 2012 at 1:43 AM, Token Type wrote:</p><blockquote type="cite"><div>
<span><div><div><div>Is there a unique method in python to unique a list? thanks</div><span><font color="#888888"><div>-- </div><div><a href="http://mail.python.org/mailman/listinfo/python-list" target="_blank">http://mail.python.org/mailman/listinfo/python-list</a></div>
</font></span></div></div></span>
</div></blockquote><div>
<br>
</div>
</div></blockquote></div><br></div></div>
</div></div></span>
</div></blockquote><div>
<br>
</div>
</div></div></div></blockquote></div><br>
</div></div></span>
</blockquote>
<div>
<br>
</div>
</div></div></blockquote></div><br></div>