[Tutor] Looking for duplicates within a list

Steven D'Aprano steve at pearwood.info
Fri Jun 11 19:37:05 CEST 2010


On Fri, 11 Jun 2010 11:57:34 pm Ken G. wrote:
> I have been working on this problem for several days and I am not
> making any progress.  I have a group of 18 number, in ascending
> order, within a list.  They ranged from 1 to 39.  Some numbers are
> duplicated as much as three times or as few as none.


To find out how many times each number is in the list:

counts = {}
for i in mylist:
    # Don't do the work if you've already done it.
    if i not in counts:  
        counts[i] = mylist.count(i)


Can we do better? For very small lists, probably not, because the 
count() method is implemented in fast C, and anything we write will be 
in slow-ish Python. But for larger lists, count() is wasteful, because 
every time you call it, it walks the entire length of the string from 
start to finish. Since we know the list is sorted, that's a rubbish 
strategy. Let's write a quick helper function:

# Untested
def count_equals(alist, start):
    """Count the number of consecutive items of alist equal 
    to the item at start. Returns the count and the next 
    place to start."""
    item = alist[start]
    count = 1
    for i in xrange(start+1, len(alist)):
        x = alist[i]
        if x == item: 
            count += 1
        else:
            return (count, i)
    return (count, len(alist))


Now, with this we can avoid wastefully starting from the beginning of 
the list each time. (But remember, this will only be worthwhile for 
sufficiently large lists. My guess, and this is only a guess, is that 
it won't be worthwhile for lists smaller than perhaps 1,000 items.)

counts = {}
i = 0
while i < len(mylist):
    count, i = count_equals(mylist, i)
    counts[i] = count


> I started with one list containing the numbers.  For example, they
> are listed as like below:
>
> a = [1, 2, 3, 3, 4]
>
> I started off with using a loop:
>
>     for j in range (0, 5):
>     x = a[0] # for example, 1
>
> How would I compare '1' with 2, 3, 3, 4?

# Untested
counts = {}
for j in range(len(a)):
    x = a[j]
    count = 1
    for k in range(j+1, len(a)):
        if x == a[k]:
            count += 1
        else:
            counts[x] = count
            break


> Do I need another duplicated list such as b = a and compare a[0] with
> either b[0], b[1], b[2], b[3], b[4]?

b = a doesn't create a duplicated list, it gives the same list a second 
name. Watch this:

>>> a = [1, 2, 3]
>>> b = a
>>> b.append("Surprise!")
>>> a
[1, 2, 3, 'Surprise!']

b and a both refer to the same object, the list [1,2,3,'Surprise!']. To 
make a copy of the list, you need:

b = a[:]

But why bother, if all you're doing is comparisons and not modifying it?



-- 
Steven D'Aprano


More information about the Tutor mailing list