[Tutor] Looking for duplicates within a list
Steven D'Aprano
steve at pearwood.info
Fri Jun 11 19:37:05 CEST 2010
On Fri, 11 Jun 2010 11:57:34 pm Ken G. wrote:
> I have been working on this problem for several days and I am not
> making any progress. I have a group of 18 number, in ascending
> order, within a list. They ranged from 1 to 39. Some numbers are
> duplicated as much as three times or as few as none.
To find out how many times each number is in the list:
counts = {}
for i in mylist:
# Don't do the work if you've already done it.
if i not in counts:
counts[i] = mylist.count(i)
Can we do better? For very small lists, probably not, because the
count() method is implemented in fast C, and anything we write will be
in slow-ish Python. But for larger lists, count() is wasteful, because
every time you call it, it walks the entire length of the string from
start to finish. Since we know the list is sorted, that's a rubbish
strategy. Let's write a quick helper function:
# Untested
def count_equals(alist, start):
"""Count the number of consecutive items of alist equal
to the item at start. Returns the count and the next
place to start."""
item = alist[start]
count = 1
for i in xrange(start+1, len(alist)):
x = alist[i]
if x == item:
count += 1
else:
return (count, i)
return (count, len(alist))
Now, with this we can avoid wastefully starting from the beginning of
the list each time. (But remember, this will only be worthwhile for
sufficiently large lists. My guess, and this is only a guess, is that
it won't be worthwhile for lists smaller than perhaps 1,000 items.)
counts = {}
i = 0
while i < len(mylist):
count, i = count_equals(mylist, i)
counts[i] = count
> I started with one list containing the numbers. For example, they
> are listed as like below:
>
> a = [1, 2, 3, 3, 4]
>
> I started off with using a loop:
>
> for j in range (0, 5):
> x = a[0] # for example, 1
>
> How would I compare '1' with 2, 3, 3, 4?
# Untested
counts = {}
for j in range(len(a)):
x = a[j]
count = 1
for k in range(j+1, len(a)):
if x == a[k]:
count += 1
else:
counts[x] = count
break
> Do I need another duplicated list such as b = a and compare a[0] with
> either b[0], b[1], b[2], b[3], b[4]?
b = a doesn't create a duplicated list, it gives the same list a second
name. Watch this:
>>> a = [1, 2, 3]
>>> b = a
>>> b.append("Surprise!")
>>> a
[1, 2, 3, 'Surprise!']
b and a both refer to the same object, the list [1,2,3,'Surprise!']. To
make a copy of the list, you need:
b = a[:]
But why bother, if all you're doing is comparisons and not modifying it?
--
Steven D'Aprano
More information about the Tutor
mailing list