For loop searching takes too long!

Dave Angel davea at ieee.org
Thu Jan 28 20:15:29 EST 2010


elsa wrote:
> Hi guys,
>
> I've got a problem with my program, in that the code just takes too
> long to run. Here's what I'm doing. If anyone has any tips, they'd be
> much appreciated!
>
> So, say I have a list of lists that looks something like this (I'm
> using a list of lists, rather than a list of tuples, as I need it to
> be mutable):
>
> myList = [[786,0],[45, 1],[673,1],...................[23,46]]
>
> there are enough entries in the outer list, that the sum of myList[i]
> [0] across all i could be as high as 10^7.
>
> Now, what I need to do is randomly choose one myList[i], however the
> distribution of my random choice needs to be proportional to the
> values of myList[i][0]. So, for this list above, I'd have a much
> higher chance of choosing myList[0] than myList[1].
>
> Here is how I'm doing it at the moment:
>
> def chooseI(myList):
> 	mySum=0
> 	choice = random.choice(range(1,sum([i[0] for i in myList])+1))
> 	for i in range(len(myList)):
> 		mySum+=myList[i][0]
> 		if mySum>=choice:
> 			return i
> 			break
>
> This works just fine if sum([i[0] for i in myList]) is less than
> 10,000, say. However if its up around 10^7, the whole thing crashes.
> Is there a faster way of doing this, that doesn't involve as many
> computational steps?
>
> Thanks!
>
> elsa
>
>   
"the whole thing crashes"  -- give us the specific crash you see, 
including the stack trace.  "Crash" is far too vague.

At first glance you could be running out of memory.  But not with the 
values you're quoting.  If a typical average value for i[0] is 50, 
that's only 200k elements.  Anyway, you could print out the len(myList) 
to see how big the list is.

There are quicker ways to do the sum, but the first thing to change is 
the random.choice() stuff.  Once you have a sum, you can use 
random.randint() on it directly.  No need to make a range list.

Another approach might be to build a new list with a running sum in it.  
Then after doing the randint() on the last (highest) item, you can do a 
bisect.bisect on that list.  The index that returns will be your return 
value.

DaveA




More information about the Python-list mailing list