Is there a unique method in python to unique a list?
John H. Li
typetoken at
Sun Sep 9 03:29:45 EDT 2012
One more test result to add, if I use your first method to unique:
seen = set()
uniqued = []
for x in original:
if not x in seen:
The results pops up in a few seconds. It makes a dramatic difference.
Thanks. See the following fasted codes:
>>> import nltk
>>> from nltk.corpus import wordnet as wn
>>> def average_polysemy(pos):
synset_list = list(wn.all_synsets(pos))
sense_number = 0
lemma_list = []
for synset in synset_list:
unique_lemma_list = []
seen = set()
for w in lemma_list:
if not w in seen:
for lemma in unique_lemma_list:
sense_number_new = len(wn.synsets(lemma, pos))
sense_number = sense_number + sense_number_new
return sense_number/len(unique_lemma_list)
>>> average_polysemy('n')
On Sun, Sep 9, 2012 at 3:18 PM, John H. Li <typetoken at> wrote:
> Thanks again. What you explain is reasonable. I try to the second method
> to unique the list. It does turn out that python just works and works
> without result. Maybe because it do iterate a long list in my example and
> slow.
> >>> def average_polysemy(pos):
> synset_list = list(wn.all_synsets(pos))
> sense_number = 0
> lemma_list = []
> for synset in synset_list:
> lemma_list.extend(synset.lemma_names)
> unique_lemma_list = []
> for w in lemma_list:
> if not w in unique_lemma_list:
> unique_lemma_list.append(w)
> return unique_lemma_list
> for lemma in unique_lemma_list:
> sense_number_new = len(wn.synsets(lemma, pos))
> sense_number = sense_number + sense_number_new
> return sense_number/len(unique_lemma_list)
> >>> average_polysemy('n')
> On Sun, Sep 9, 2012 at 2:36 PM, Donald Stufft <donald.stufft at>wrote:
>> For a short list the difference is going to be negligible.
>> For a long list the difference is that checking if an item in a list
>> requires iterating over the list internally to find it but checking if an
>> item is inside of a set uses a faster method that doesn't require iterating
>> over the list. This doesn't matter if you have 20 or 30 items, but imagine
>> if instead you have 50 million items. Your going to be iterating over the
>> list a lot and that can introduce significant slow dow.
>> On the other hand using a set is faster in that case, but because you are
>> storing an additional copy of the data you are using more memory to store
>> extra copies of everything.
>> On Sunday, September 9, 2012 at 2:31 AM, John H. Li wrote:
>> Thanks first, I could understand the second approach easily. The first
>> approach is a bit puzzling. Why are seen=set() and seen.add(x) still
>> necessary there if we can use unique.append(x) alone? Thanks for your
>> enlightenment.
>> On Sun, Sep 9, 2012 at 1:59 PM, Donald Stufft <donald.stufft at>wrote:
>> seen = set()
>> uniqued = []
>> for x in original:
>> if not x in seen:
>> seen.add(x)
>> uniqued.append(x)
>> or
>> uniqued = []
>> for x in oriignal:
>> if not x in uniqued:
>> uniqued.append(x)
>> The difference between is option #1 is more efficient speed wise, but
>> uses more memory (extraneous set hanging around), whereas the second is
>> slower (``in`` is slower in lists than in sets) but uses less memory.
>> On Sunday, September 9, 2012 at 1:56 AM, John H. Li wrote:
>> Many thanks. If I want keep the order, how can I deal with it?
>> or we can list(set([1, 1, 2, 3, 4])) = [1,2,3,4]
>> On Sun, Sep 9, 2012 at 1:47 PM, Donald Stufft <donald.stufft at>wrote:
>> If you don't need to retain order you can just use a set,
>> set([1, 1, 2, 3, 4]) = set([1, 2, 3, 4])
>> But set's don't retain order.
>> On Sunday, September 9, 2012 at 1:43 AM, Token Type wrote:
>> Is there a unique method in python to unique a list? thanks
>> --
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
More information about the Python-list
mailing list