testing for uniquness in a large list
Cliff Wells
clifford.wells at comcast.net
Wed Oct 20 07:07:20 EDT 2004
On Wed, 2004-10-20 at 11:01 +0100, Lol McBride wrote:
> Hi All,
> I'm looking for some help in developing a way to test for the uniqueness
> of an item in a large list.To illustrate what I mean the code below is an
> indicator of what I want to do but I'm hoping for a faster way than the
> one shown.Basically,I have a list of 20 integers and I want to create
> another list of 200000 unique subsets of 12 integers from this list.WhatI
> have done here is to use the sample()function from the random module
> and then compare the result to every item in the ints list to check for
> uniqueness - as you can guess this takes an interminable amount of time to
> grind through.Can someone please enlighten me as to how I can do this and
> keep the amount of time to do it to a minimum?
>
> Thanks for taking the time to read this and doubly so if you're able to
> help.
> Cheers,
> Lol McBride
>
> #!/usr/bin/python
> from random import *
> #seq is the pool to choose from
> seq = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
> # this is how many times to choose a unique selection
> rlen = 200000
> counter = 100
> ints = [0]
> while 1:
> # if this is the last value for rlen then we stop
> if rlen == 0:
> break
> intLen = len(ints)
> cnt = 0
> testVal = sample(seq,12)
> testVal.sort()
> if len(ints)>=counter:
> print len(ints)
> counter = counter+100
> while 1:
> if ints.count(testVal)==1:
> cnt = cnt+1
> break
> intLen = intLen -1
> if intLen == 0:
> break
> if cnt == 1:
> continue
> else:
> ints.append(testVal)
> rlen = rlen -1
>
Disregarding more efficient methods of generating your list (of which
I'm sure there are many), part of the "interminable time" is probably
taken up in the infinite loop you have ;) Set rlen to a much smaller
number and you'll see that the program still never terminates.
You'll notice that:
ints = [0]
while 1:
...
intLen = len(ints) # aka 1
...
while 1:
...
intLen = intLen - 1 # aka 0
if intLen == 0: # always true
break
...
Basically this means that nothing is ever appended to ints nor is rlen
ever decremented (and these are the tests by which the loops terminate).
I hear that Python 3000 will complete infinite loops in half the time
however ;)
Regards,
Cliff
--
Cliff Wells <clifford.wells at comcast.net>
More information about the Python-list
mailing list