[Tutor] Testing if a number occurs more than once
Alfred Milgrom
fredm@smartypantsco.com
Tue Dec 3 19:09:01 2002
Although I cannot answer Michael's New Scientist problem, here is some more
on the fastest solution for finding duplicates.
The function given by Danny Yoo appears to have been the fastest test
proposed so far:
def morethanone2(l):
l.sort()
for i in range(1, len(l)):
if l[i-1] == l[i]:
return 1
return 0
>>> list = range(100000)
>>> profile.run('morethanone2(list)')
3 function calls in 0.173 CPU seconds
However, if pure speed is the issue, the following is faster :
def morethanone3(l):
d={}
for x in l : d[x] = 1
return (len(d) != len(l))
>>> list = range(100000)
>>> profile.run('morethanone3(list)')
3 function calls in 0.097 CPU seconds
The result is even more impressive if one uses a shuffled list, since the
example above using a sorted list means that sort() does next to no work:
>>> list = range(100000)
>>> random.shuffle(list)
>>> profile.run('morethanone2(list)')
3 function calls in 0.532 CPU seconds
>>> list = range(100000)
>>> random.shuffle(list)
>>> profile.run('morethanone3(list)')
3 function calls in 0.162 CPU seconds
This new function of course does not tell you where the duplication occurs.
If this is required, you can use the following function, which is a bit
slower but still faster than Danny's original function:
def morethanone4(l):
d={}
for x in l : d[x] = d.setdefault(x,0)+1
return (len(d) != len(l))
>>> list = range(100000)
>>> random.shuffle(list)
>>> profile.run('morethanone4(list)')
3 function calls in 0.293 CPU seconds
Finally, if speed of finding what or where the duplicate is an issue, here
is a variation on Danny's function using List Comprehension which is about
the same speed as the original function, but potentially gives more
information:
def morethanone5(l) :
l.sort()
x=[l[i] for i in range(len(l)-1) if l[i]==l[i+1]]
return len(x)
>>> list = range(100000)
>>> random.shuffle(list)
>>> profile.run('morethanone5(list)')
3 function calls in 0.594 CPU seconds
This last function gives you access to a list with all the duplicates in it
(if x is made a global variable).
Best regards,
Fred Milgrom