urn scheme

Duncan Smith buzzard at urubu.freeserve.co.uk
Mon Jul 28 15:38:21 CEST 2003

         I'm currently doing some simulations where I need to sample without
replacement from a list of counts.  (Actually it's a Numeric array, but I'm
currently converting the array to a list, using the following function and
converting back to an array.)  I *desperately *need to speed this up
(simulations currently take over 24 hours).  This sampling is a real
bottleneck, and accounts for about 90% of the time cost.  If I can get a
10-fold speed up I might be in business, and a 100-fold speed up might even
allow me to meet a deadline.  Using psyco appears to give me no speed up
whatsoever (that can't be right, can it?).  I am currently working on
reducing the number of function calls to urn(), but the biggest speed up
will still come from improving the performance of this function.  Any ideas
for a significant speed up?  TIA.


import random

def urn(bins, draws, drawn=0, others=0):

    """Bins is a sequence containing the numbers
    of balls in the urns.  'drawn' and 'others'
    define how the contents of the bins should
    be updated after each draw in draws.  The
    default arguments correspond to sampling
    with replacement.  e.g. drawn = -1 corresponds
    to sampling without replacement"""

    len_bins = len(bins)
    res = [0] * len_bins
    cum_bins = [bins[0]] + [0] * (len_bins-1)
    for i in range(1, len_bins):
        cum_bins[i] = cum_bins[i-1] + bins[i]
    for i in range(draws):
        balls = random.random() * cum_bins[-1]
        adj = 0
        for j in range(len_bins):
            if adj == 0 and cum_bins[j] > balls:
                res[j] += 1
                cum_bins[j] += drawn
                adj = 1
                cum_bins[j] += (j - adj) * others + adj * drawn

    return res

>>> urn((0,2,5,3,4), 10, -1)
[0, 1, 4, 1, 4]

More information about the Python-list mailing list