[Python-Dev] Re: [Python-checkins] python/dist/src/Lib random.py, 1.62, 1.63

Mon Aug 30 21:45:17 CEST 2004

[rhettinger at users.sourceforge.net]
> Modified Files:
>        random.py
> Log Message:
> Teach the random module about os.urandom().
> 
...
> * Provide an alternate generator based on it.
...
> +    _tofloat = 2.0 ** (-7*8)    # converts 7 byte integers to floats
...
> +class HardwareRandom(Random):
...
> +    def random(self):
...
> +        return long(_hexlify(_urandom(7)), 16) * _tofloat

Feeding in more bits than actually fit in a float leads to bias due to
rounding.  Here:

"""
import random
import math
import sys

def main(n, useHR):
    from math import ldexp
    if useHR:
        get = random.HardwareRandom().random
    else:
        get = random.random
    counts = [0, 0]
    for i in xrange(n):
        x = long(ldexp(get(), 53)) & 1
        counts[x] += 1
    print counts
    expected = n / 2.0
    chisq = (counts[0] - expected)**2 / expected + \
            (counts[1] - expected)**2 / expected
    print "chi square statistic, 1 df, =", chisq

n, useNR = map(int, sys.argv[1:])
main(n, useNR)
"""

Running with the Mersenne random gives comfortable chi-squared values
for the distribution of bit 2**-53:

C:\Code\python\PCbuild>python temp.py 100000 0
[50082, 49918]
chi square statistic, 1 df, = 0.26896

C:\Code\python\PCbuild>python temp.py 100000 0
[49913, 50087]
chi square statistic, 1 df, = 0.30276

C:\Code\python\PCbuild>python temp.py 100000 0
[50254, 49746]
chi square statistic, 1 df, = 2.58064

Running with HardwareRandom instead gives astronomically unlikely values:

C:\Code\python\PCbuild>python temp.py 100000 1
[52994, 47006]
chi square statistic, 1 df, = 358.56144

C:\Code\python\PCbuild>python temp.py 100000 1
[53097, 46903]
chi square statistic, 1 df, = 383.65636

C:\Code\python\PCbuild>python temp.py 100000 1
[53118, 46882]
chi square statistic, 1 df, = 388.87696

One way to repair that is to replace the computation with

        return _ldexp(long(_hexlify(_urandom(7)), 16) >> 3, -BPF)

where _ldexp is math.ldexp (and BPF is already a module constant).

Of course that would also be biased on a box where C double had fewer
than BPF (53) bits of precision (but the Twister implementation would
show the same bias then).