testing for uniquness in a large list

Alex Martelli aleaxit at yahoo.com
Wed Oct 20 07:37:32 EDT 2004


Lol McBride <newspost at lolmc.com> wrote:

> I'm looking for some help in developing a way to test for the uniqueness
> of an item in a large list.To illustrate what I mean the code below is an
> indicator of what I want to do but I'm hoping for a faster way than the
> one shown.Basically,I have a list of 20 integers and I want to create
> another list of 200000 unique subsets of 12 integers from this list.WhatI
> have done here is to use the sample()function from the random module
> and then compare the result to every item in the ints list to check for
> uniqueness - as you can guess this takes an interminable amount of time to
> grind through.Can someone please enlighten me as to how I can do this and
> keep the amount of time to do it to a minimum?

One word: dictionaries!

Untested, but should work...:


import random         # dont't use from...import *, on general grounds

def picks(seq=xrange(1, 21), rlen=200000, picklen=12):
    results = {}
    while len(results) < rlen:
        pick = random.sample(seq, picklen)
        pick.sort()
        pick = tuple(pick)
        results[pick] = 1
    return results.keys()

this returns a list of tuples; if you need a list of lists,

    return [list(pick) for pick in results]

In 2.4, you could use "result=set()" instead of {}, results.add(pick) to
maybe add a new pick, and possibly "return list(result)" if you need a
list of tuples as the function's overall return value (the list
comprehension for the case in which you need a list lists stays OK).
But that's just an issue of readability, not speed.  Slight speedups can
be had by hoisting some lookups out of the while loop, but nothing
major, I think -- eg. sample=random.sample just before the while, and
use sample(seq, picklen) within the while.  Putting all the code in a
function and using local variables IS a major speed win -- local
variable setting and access is WAY faster than globals'.


Alex



More information about the Python-list mailing list