testing for uniquness in a large list
Alex Martelli
aleaxit at yahoo.com
Wed Oct 20 07:37:32 EDT 2004
Lol McBride <newspost at lolmc.com> wrote:
> I'm looking for some help in developing a way to test for the uniqueness
> of an item in a large list.To illustrate what I mean the code below is an
> indicator of what I want to do but I'm hoping for a faster way than the
> one shown.Basically,I have a list of 20 integers and I want to create
> another list of 200000 unique subsets of 12 integers from this list.WhatI
> have done here is to use the sample()function from the random module
> and then compare the result to every item in the ints list to check for
> uniqueness - as you can guess this takes an interminable amount of time to
> grind through.Can someone please enlighten me as to how I can do this and
> keep the amount of time to do it to a minimum?
One word: dictionaries!
Untested, but should work...:
import random # dont't use from...import *, on general grounds
def picks(seq=xrange(1, 21), rlen=200000, picklen=12):
results = {}
while len(results) < rlen:
pick = random.sample(seq, picklen)
pick.sort()
pick = tuple(pick)
results[pick] = 1
return results.keys()
this returns a list of tuples; if you need a list of lists,
return [list(pick) for pick in results]
In 2.4, you could use "result=set()" instead of {}, results.add(pick) to
maybe add a new pick, and possibly "return list(result)" if you need a
list of tuples as the function's overall return value (the list
comprehension for the case in which you need a list lists stays OK).
But that's just an issue of readability, not speed. Slight speedups can
be had by hoisting some lookups out of the while loop, but nothing
major, I think -- eg. sample=random.sample just before the while, and
use sample(seq, picklen) within the while. Putting all the code in a
function and using local variables IS a major speed win -- local
variable setting and access is WAY faster than globals'.
Alex
More information about the Python-list
mailing list