Random Drawing Simulation -- performance issue
Travis E. Oliphant
oliphant.travis at ieee.org
Tue Sep 12 22:43:53 EDT 2006
Brendon Towle wrote:
> I need to simulate scenarios like the following: "You have a deck of
> 3 orange cards, 5 yellow cards, and 2 blue cards. You draw a card,
> replace it, and repeat N times."
>
Thinking about the problem as drawing sample froms a discrete
distribution defined by the population might help.
For example, in SciPy you can define your own discrete random variable:
var = scipy.stats.rv_discrete(name='sample',
values=([0,1,2],[3/10.,5/10.,2/10.]))
Then
var.rvs(size=10000)
would quickly return an array of "draws"
If you didn't want to install SciPy, but were willing to install NumPy,
then the crux of the algorithm is to generate an entire array of uniform
random variables: numpy.random.rand(count) and then map them through
the inverse cumulative distribution function to generate your samples.
The inverse cumulative distribution function is just a series of steps
whose width depends on the probablity distribution.
Thus, the population defines the draw boundaries. To make it easy, just
multiply the uniform random number by the total number of cards and then
the boundaries are on the integers of the number of each kind of card.
Here is an implementation. I find this version to be 2x - 5x faster
depending on how many draws are used.
import numpy as N
def randomDrawing_N(count, population):
probs = [x[0] for x in population]
res = [[0, item[1]] for item in population]
nums = N.random.rand(count)
cprobs = N.cumsum([0]+probs)
nums *= cprobs[-1]
for k, val in enumerate(probs):
res[k][0] += ((nums > cprobs[k]) & (nums < cprobs[k+1])).sum()
return res
-Travis Oliphant
More information about the Python-list
mailing list