urn scheme
Terry Reedy
tjreedy at udel.edu
Mon Jul 28 13:05:00 EDT 2003
"Duncan Smith" <buzzard at urubu.freeserve.co.uk> wrote in message
news:bg38sa$e6r$1 at news7.svr.pol.co.uk...
> Hello,
> I'm currently doing some simulations where I need to sample
without
> replacement from a list of counts. (Actually it's a Numeric array,
but I'm
> currently converting the array to a list, using the following
function and
> converting back to an array.) I *desperately *need to speed this up
> (simulations currently take over 24 hours). This sampling is a real
> bottleneck, and accounts for about 90% of the time cost. If I can
get a
> 10-fold speed up I might be in business, and a 100-fold speed up
might even
> allow me to meet a deadline. Using psyco appears to give me no
speed up
> whatsoever (that can't be right, can it?). I am currently working
on
> reducing the number of function calls to urn(), but the biggest
speed up
> will still come from improving the performance of this function.
Any ideas
> for a significant speed up? TIA.
Do the minimum needed for each call and minimize intermediate
structures. Among other things, this means specializing the
function -- no options. Instead separate with and without replacement
functions. Consider writing urn() as a generator returning one draw
at a time instead of a list.
Assuming that 'bins' means 'colors' (number of each), I suggest
constructing an single urn list as follows:
contents = []
for i in range(len(bins)): contents.extend(bins[i]*[i])
This replaces cum_bins and the multiple comparisons and adjustments.
It can go inside or outside urn() depending upon whether a given
configuration is used just once or muptiple times. Then:
sampling one item with replacement: random.choice(contents).
sampling all without replacement: random.shuffle(contents)
sampling some without replacement: random.shuffle(contents)[:draws]
or, if draws is small enough (test timings for switch point)
def urn_swor(colors, draws):
'''Construct multicolor urn and destructively sample without
replacement
(swor) 'draws' items by successively removing random element
thereof.
Colors is array of number of each color.
'''
ncolors = len(colors)
contents = []
for i in range(ncolors):
contents.extend(colors[i]*[i])
n = len(contents)
rand = random.randrange
res = ncolors * [0]
while draws:
i = rand(n)
res[contents[i]] += 1
contents[i] = contents[-1]
n -= 1
draws -= 1
return res
>>> urn_swor((0,2,5,3,4), 10)
[0, 2, 2, 0, 6]
>>> urn_swor((0,2,5,3,4), 10)
[0, 1, 4, 1, 4]
>>> urn_swor((0,2,5,3,4), 10)
[0, 1, 3, 1, 5]
>>> urn_swor((0,2,5,3,4), 10)
[0, 2, 1, 1, 6]
> -----------------------------------------------------------
> import random
>
> def urn(bins, draws, drawn=0, others=0):
>
> """Bins is a sequence containing the numbers
> of balls in the urns. 'drawn' and 'others'
Sampling one urn with k colors (variable number of each) is the same
as a weighted sampling from k urns with 1 color each but should be
much faster when done as described above.
> define how the contents of the bins should
> be updated after each draw in draws. The
> default arguments correspond to sampling
> with replacement. e.g. drawn = -1 corresponds
> to sampling without replacement"""
'others' not defined
> len_bins = len(bins)
> res = [0] * len_bins
> cum_bins = [bins[0]] + [0] * (len_bins-1)
> for i in range(1, len_bins):
> cum_bins[i] = cum_bins[i-1] + bins[i]
> for i in range(draws):
> balls = random.random() * cum_bins[-1]
> adj = 0
> for j in range(len_bins):
> if adj == 0 and cum_bins[j] > balls:
> res[j] += 1
> cum_bins[j] += drawn> else:
> cum_bins[j] += (j - adj) * others + adj * drawn
I have no idea what this is about.
> return res
> ----------------------------------------------------------------
> >>> urn((0,2,5,3,4), 10, -1)
> [0, 1, 4, 1, 4]
Terry J. Reedy
More information about the Python-list
mailing list