delete duplicates in list

Alex Martelli aleax at aleax.it
Thu Oct 30 05:20:37 EST 2003


Hannu Kankaanp?? wrote:
   ...
>> So - he'll have a list anyway.
> 
> So I can't agree with this. You don't know if his Python virtual machine
> is a group of monkeys. Python is supposed to be a high level language.

So, in that case those monkeys would surely be perched up on tall trees.


>> Which is of course reasonable, as the check for existence in the set
>> might be performed in O(ln n) instead of O(n).
> 
> Actually the Set in sets module has an average lookup of O(1), worst
> case O(n) (not 100% sure of worst case, but 99% sure). It's been

Hmmm -- could you give an example of that worstcase...?  a _full_
hashtable would give such behavior, but Python's dicts always ensure
the underlying hashtables aren't too full...

> implemented with dictionaries, which in turn are hash tables.

Yep - check it out...:

[alex at lancelot bo]$ timeit.py -c -s'import sets' -s'x=sets.Set(xrange(100))'
'7 in x'
100000 loops, best of 3: 2.3 usec per loop
[alex at lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xrange(1000))' '7 in x'
100000 loops, best of 3: 2.3 usec per loop
[alex at lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xrange(10000))' '7 in x'
100000 loops, best of 3: 2.2 usec per loop
[alex at lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xrange(100000))' '7 in x'
100000 loops, best of 3: 2.2 usec per loop
[alex at lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xrange(1000000))' '7 in x'
100000 loops, best of 3: 2.3 usec per loop

see the pattern...?-)


Same when something is NOT there, BTW:

[alex at lancelot bo]$ timeit.py -c -s'import sets'
-s'x=sets.Set(xrange(1000000))' 'None in x'
100000 loops, best of 3: 2.3 usec per loop

etc, etc.






More information about the Python-list mailing list