Find duplicates in a list and count them ...

Albert Hopkins marduk at letterboxes.org
Thu Mar 26 15:54:22 EDT 2009


On Thu, 2009-03-26 at 12:22 -0700, Paul.Scipione at aps.com wrote:
> Hello,
>  
> I'm a newbie to Python.  I have a list which contains integers (about
> 80,000).  I want to find a quick way to get the numbers that occur in
> the list more than once, and how many times that number is duplicated
> in the list.  I've done this right now by looping through the list,
> getting a number, querying the list to find out how many times the
> number exists, then writing it to a new list.  On this many records it
> takes a couple of minutes.  What I am looking for is something in
> python that can grab this info without looping through a list.
>  


Why not build a histogram?

$ cat test.py 
from random import randint

l = list()
for i in xrange(80000):
    l.append(randint(0,10))

hist = dict()
for i in l:
    hist[i] = hist.get(i, 0) + 1

for i in range(10):
    print "%s: %s" % (i, hist.get(i, 0))



$ time python test.py 
0: 7275
1: 7339
2: 7303
3: 7348
4: 7206
5: 7323
6: 7230
7: 7348
8: 7166
9: 7180

real	0m0.533s
user	0m0.518s
sys	0m0.011s





More information about the Python-list mailing list