[Tutor] Testing if a number occurs more than once

Jeff Shannon jeff@ccvcorp.com
Mon Dec 2 14:30:01 2002


Michael Williams wrote:

> Hi,
>
> I think I may be missing out on something really obvious here (such as
> a built-in), but I'm trying to ascertain whether a given number occurs
> more than once in a list.

I'd take a different approach, myself.  Since you're only looking for a
true/false for the entire list, I'd code it like this:

def morethanone(L):
    # If L has less than 2 elements, it can't have duplicates...
    if len(L) <= 1:
        return 0    #false
    # make a copy to avoid changing original
    mycopy = L[:]
    # now sort the copy...
    mycopy.sort()
    # Once sorted, duplicate elements will be next to each
    # other, so we can simply loop through and compare
    # adjacent elements.
    for n in range(1, len(mycopy)):
        if mycopy[n] == mycopy[n-1]:
            return 1   #true
    # If we have looped through the  entire list without
    # returning,  then there must be no duplicates
    return 0   #false

I don't know whether this will be significantly faster than your method,
however -- that copy and sort could take some time, especially on a long
list.  It could be more efficient if you know that your original list
doesn't rely on a particular order -- then you can sort the original list,
and spare the copying overhead.

Another approach, more like yours but with the potential to shortcut as
soon as a duplicate is found:

def morethanone(L):
    res = {}
    for item in L:
        # if item isn't already set in res
        if res.get(item, 0):
            # then set it to 1
            res[item] = 1
        else:
            # otherwise, it's a duplicate
            return 1
    # if we've found no duplicates, then return false
    return 0

The performance of this can vary greatly depending on the list you feed
it, and its worst-case performance will always be for a list with no
duplicates.  But, that worst-case performance is exactly the same as what
you're getting in your original version, and if a duplicate is found
quickly, the best-case performance on this can be very good.

I haven't done any timings (which are the only way to be sure), but I'm
pretty sure that this second version would be faster than my first
version, since it avoids the need to sort() the list.

Jeff Shannon
Technician/Programmer
Credit International