SHA-based subclass for random module

Tim Peters tim.one at comcast.net
Mon Mar 22 23:48:14 EST 2004


[Tim]
>> ...
>> The primary weakness of linear congruential generators (still the
>> most common kind) is dramatically illustrated all over the web;
>> e.g., look at the 3D plot at:
>>
>>    http://www.unf.edu/ccec/cis/CIShtml/RANDUinfo.html

[David MacQuigg]
> This is the first good answer I have heard, and it has changed my
> thinking on this question.

The mathematicians you talked with before may have droned on endlessly (and
probably did) about "the spectral test".  At heart that's just a way of
quantifying "how bad N-dimensional plots can look" -- definitely a case
where *seeing* a bad plot is worth more than a whole book of dimly
understood equations.

The other things they droned on about aren't really different in principle:
if a sequence is truly random, then an unbounded number of observable
effects must follow from that.  Like N-dimensional plots won't have obvious
visual patterns; the i'th element is bigger than the i+1997'th element about
half the time; the rank of random binary NxN matrices derived from the
sequence should follow a certain distribution; ... it's endless.  Not all
are easy to picture, but all are critical to the correct functioning of
*some* app.

> I understand now that the problem is not just something that might be
> found in the tenth decimal place.  The clustering of random points into
> very thin planes could radically affect a simulation depending on the
> distances between those points.

Oh yes!  OTOH, other simulations may not care at all.

> The problem I am working on (statistical simulation of electronic
> circuits) involves a large number of 1D distributions, but with
> carefully controlled correlations between these distributions.  We are
> probably just lucky that the PRNG we are using has not caused a
> noticable problem,

If you're only using one PRNG, you're living too dangerously.  Even
theoretically robust PRNGs can have implementation bugs, so it's best
practice to verify simulations by running them in their entirety with at
least two distinct PRNGs, and of different algebraic structure (for example,
a good LCG (not RANDU <wink>), and a lagged Fibonacci generator -- or the
Twister).  If results agree within expected random noise between two such
runs, that's a powerful check.

> but seeing just one clear example like the above has made me decide
> to be much more cautious in selecting a PRNG.

Or select two, and stop worrying about it.  Even if you've got a poor one,
there's a very effective way to make it better (due to George Marsaglia, a
pioneer in this field):

class Shuffle:
    def __init__(self, base_random, N=128):
        self.base_random = base_random
        self.table = [base_random() for dummy in range(N)]
        self.next = base_random()

    def random(self):
        result = self.next
        index = int(result * len(self.table))
        self.next = self.table[index]
        self.table[index] = self.base_random()
        return result

Pass your base generator to the Shuffle constructor, then use the instance's
random() method.

For example, here's a truly horrid generator:

def horrid():
    while True:
        for i in range(1, 100):
            yield i/100.0

Then

>>> bad_random = horrid().next
>>> for i in range(18):
...     print bad_random(),
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 0.12 0.13
0.14 0.15 0.16 0.17 0.18

So it's as bad as it looks <wink>.  But:

>>> better = Shuffle(horrid().next).random
>>> for i in range(18):
...     print better(),
0.3  0.39 0.5  0.65 0.84 0.09 0.12 0.16 0.21 0.27 0.35 0.45 0.58
0.75 0.97 0.26 0.34 0.44

By eyeball that's much better already, but the lengths of ascending runs are
still too long.  This is because the original generator was *so* poor, and I
chose it in a way to give Shuffle a hard time.  Even so, keep running it,
and Shuffle gets better & better at hiding the extreme regularity in
horrid()'s result sequence.

This technique should be better known:  it's simple, quick, applies to any
generator, and can make even the worst generators usable for many
applications.





More information about the Python-list mailing list