[Tutor] random number equations . . .

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Thu Jun 3 18:39:42 EDT 2004



On Fri, 4 Jun 2004, Glen Wheeler wrote:

>
>   Well, as a suggestion, instead of using the global declaration you could
> write a little function which returned a randum number in some range on
> demand.
>
> >>> def gimmeRandomInRange(rng):
> ..  return int(round(random.random()*rng))
> ..
> >>> gimmeRandomInRange(100)
> 99
> >>> for i in range(100):
> ..  print gimmeRandomInRange(1000)
> ..
> 910
> 664
> 466
> 186
> 600
> 930
> 528
> 65
> 392
> ..
>
>   Of course you could just use the random.randint() function, but none
> of these are as fun as writing your own ;).



Hi Glen,

But whenever we're dealing with probability, it's often a good idea to
reuse what other folks have already done.  *grin*


This kind of selection function already exists in random.randrange():

    http://www.python.org/doc/lib/module-random.html#l2h-1147


The problem with gimmeRandomInRange() is that it's biased.  It's easier to
see what this means if we use a small range.  Let's see what happens when
we use it for a range between 0 and 2, inclusive:

###
>>> def distribution(numbers):
...     """Calculates a distribution of the numbers."""
...     counts = {}
...     for n in numbers:
...         counts[n] = counts.get(n, 0) + 1
...     return counts
...
>>> distribution([gimmeRandomInRange(2) for i in range(1000)])
{0: 238, 1: 523, 2: 239}
###


There's a big hump near one!


Why is that?  Why are the numbers biased biased around 1?  If we draw
things out:

                A            B           C
            |--------(---------------)-------|
            0       0.5      1      1.5      2


our number line splits into three regions A, B, and C.  The behavior of
the round()ing causes region B to be larger than the other two.


But random.randrange() doesn't suffer this defect:

###
>>> distribution([random.randrange(0, 3) for i in range(1000)])
{0: 343, 1: 344, 2: 313}
###


The 'random' module has many helper functions that we should use.
They're there because it's all too easy not to take into consideration
some of the subtle problems with random number generation.



Hope this helps!




More information about the Tutor mailing list