testing for uniquness in a large list

Lol McBride newspost at lolmc.com
Wed Oct 20 15:13:04 CEST 2004


On Wed, 20 Oct 2004 13:37:32 +0200, Alex Martelli wrote:

> Lol McBride <newspost at lolmc.com> wrote:
> 
>> I'm looking for some help in developing a way to test for the uniqueness
>> of an item in a large list.To illustrate what I mean the code below is an
>> indicator of what I want to do but I'm hoping for a faster way than the
>> one shown.Basically,I have a list of 20 integers and I want to create
>> another list of 200000 unique subsets of 12 integers from this list.WhatI
>> have done here is to use the sample()function from the random module
>> and then compare the result to every item in the ints list to check for
>> uniqueness - as you can guess this takes an interminable amount of time to
>> grind through.Can someone please enlighten me as to how I can do this and
>> keep the amount of time to do it to a minimum?
> 
> One word: dictionaries!
> 
> Untested, but should work...:
> 
> 
> import random         # dont't use from...import *, on general grounds
> 
> def picks(seq=xrange(1, 21), rlen=200000, picklen=12):
>     results = {}
>     while len(results) < rlen:
>         pick = random.sample(seq, picklen)
>         pick.sort()
>         pick = tuple(pick)
>         results[pick] = 1
>     return results.keys()
> 
> this returns a list of tuples; if you need a list of lists,
> 
>     return [list(pick) for pick in results]
> 
> In 2.4, you could use "result=set()" instead of {}, results.add(pick) to
> maybe add a new pick, and possibly "return list(result)" if you need a
> list of tuples as the function's overall return value (the list
> comprehension for the case in which you need a list lists stays OK).
> But that's just an issue of readability, not speed.  Slight speedups can
> be had by hoisting some lookups out of the while loop, but nothing
> major, I think -- eg. sample=random.sample just before the while, and
> use sample(seq, picklen) within the while.  Putting all the code in a
> function and using local variables IS a major speed win -- local
> variable setting and access is WAY faster than globals'.
> 
> 
> Alex
Thank you Alex - i'll try your method and see how i get on
Lol




More information about the Python-list mailing list