urn scheme

Terry Reedy tjreedy at udel.edu
Mon Jul 28 13:05:00 EDT 2003


"Duncan Smith" <buzzard at urubu.freeserve.co.uk> wrote in message
news:bg38sa$e6r$1 at news7.svr.pol.co.uk...
> Hello,
>          I'm currently doing some simulations where I need to sample
without
> replacement from a list of counts.  (Actually it's a Numeric array,
but I'm
> currently converting the array to a list, using the following
function and
> converting back to an array.)  I *desperately *need to speed this up
> (simulations currently take over 24 hours).  This sampling is a real
> bottleneck, and accounts for about 90% of the time cost.  If I can
get a
> 10-fold speed up I might be in business, and a 100-fold speed up
might even
> allow me to meet a deadline.  Using psyco appears to give me no
speed up
> whatsoever (that can't be right, can it?).  I am currently working
on
> reducing the number of function calls to urn(), but the biggest
speed up
> will still come from improving the performance of this function.
Any ideas
> for a significant speed up?  TIA.

Do the minimum needed for each call and minimize intermediate
structures.  Among other things, this means specializing the
function -- no options.  Instead separate with and without replacement
functions.  Consider writing urn() as a generator returning one draw
at a time instead of a list.

Assuming that 'bins' means 'colors' (number of each), I suggest
constructing an single urn list as follows:

  contents = []
  for i in range(len(bins)): contents.extend(bins[i]*[i])

This replaces cum_bins and the multiple comparisons and adjustments.
It can go inside or outside urn() depending upon whether a given
configuration is used just once or muptiple times.  Then:

  sampling one item with replacement: random.choice(contents).
  sampling all without replacement:     random.shuffle(contents)
  sampling some without replacement: random.shuffle(contents)[:draws]
   or, if draws is small enough (test timings for switch point)

def urn_swor(colors, draws):
    '''Construct multicolor urn and destructively sample without
replacement
    (swor) 'draws' items by successively removing random element
thereof.
    Colors is array of number of each color.
    '''
    ncolors = len(colors)
    contents = []
    for i in range(ncolors):
       contents.extend(colors[i]*[i])
    n = len(contents)
    rand = random.randrange
    res = ncolors * [0]
    while draws:
        i = rand(n)
        res[contents[i]] += 1
        contents[i] = contents[-1]
        n -= 1
        draws -= 1
    return res

>>> urn_swor((0,2,5,3,4), 10)
[0, 2, 2, 0, 6]
>>> urn_swor((0,2,5,3,4), 10)
[0, 1, 4, 1, 4]
>>> urn_swor((0,2,5,3,4), 10)
[0, 1, 3, 1, 5]
>>> urn_swor((0,2,5,3,4), 10)
[0, 2, 1, 1, 6]

> -----------------------------------------------------------
> import random
>
> def urn(bins, draws, drawn=0, others=0):
>
>     """Bins is a sequence containing the numbers
>     of balls in the urns.  'drawn' and 'others'

Sampling one urn with k colors (variable number of each) is the same
as a weighted sampling from k urns with 1 color each but should be
much faster when done as described above.

>     define how the contents of the bins should
>     be updated after each draw in draws.  The
>     default arguments correspond to sampling
>     with replacement.  e.g. drawn = -1 corresponds
>     to sampling without replacement"""

'others' not defined

>     len_bins = len(bins)
>     res = [0] * len_bins
>     cum_bins = [bins[0]] + [0] * (len_bins-1)
>     for i in range(1, len_bins):
>         cum_bins[i] = cum_bins[i-1] + bins[i]
>     for i in range(draws):
>         balls = random.random() * cum_bins[-1]
>         adj = 0
>         for j in range(len_bins):
>             if adj == 0 and cum_bins[j] > balls:
>                 res[j] += 1
>                 cum_bins[j] += drawn>             else:
>                 cum_bins[j] += (j - adj) * others + adj * drawn

I have no idea what this is about.

>     return res
> ----------------------------------------------------------------
> >>> urn((0,2,5,3,4), 10, -1)
> [0, 1, 4, 1, 4]

Terry J. Reedy






More information about the Python-list mailing list