SHA-based subclass for random module

Tue Mar 23 01:39:23 EST 2004

On 22 Mar 2004 20:14:47 -0800, Paul Rubin
<http://phr.cx@NOSPAM.invalid> wrote:

>David MacQuigg <dmq at gain.com> writes:
>> I'm still curious as to the answer to my very simple and basic
>> question:  Are there any examples that show a *non-subjective*
>> difference in a physically meaningful result between one PRNG and
>> another?  By "physically meaningful" I mean to include things like
>> means, standard deviations, correlations between two distributions,
>> but rule out some measure that would only mean something to a
>> mathematician.  By "non-subjective" I mean a difference that can be
>> observed in a repeatable fashion, where the statistical significance
>> is not always at the edge when you repeat the experiment with more
>> trials.
>
>Tim gave a good answer for stuff like physics simulations.  Another
>answer might be in terms of something like a computer game: you're
>playing poker online with the cards dealt by a server programmed in
>Python using some PRNG.  Both players trust that the server isn't
>cheating, and both have a complete copy of the server source code and
>all the data that was on it at the start of the game except the PRNG's
>initial seed, and are permitted to use that info in any way they can.
>Since you're playing for a US$ 250,000 jackpot (sites like
>partypoker.com routinely run online tournaments with prizes that
>size), you must assume that your opponent has a team of mathemeticians
>analyzing the PRNG algorithm and all its previous observed output, in
>order to get some probabilistic edge in guessing what your cards are.
>If they fail to get that probabilistic edge, your chances are even,
>but if they succeed, they take you to the cleaners.  I would call that
>a non-sujective difference ;-).  
>
>Another simulation example: you're trying to program a computer to
>play poker or backgammon, using neural nets, genetic algorithms, or
>whatever the AI buzzword of the week is.  You train the AI by having
>it play a zillion simulated games against itself and eventually it's
>playing very well.  Has it really tuned itself for the game, or has it
>somehow modelled the PRNG and figured out how to guess the next card?
>Simulating systems with this kind of emergent behavior can make
>heavier demands on PRNG's than simulating random gas molecules.
>
>The point is, a pure statistical analysis of the swirling particles in
>the universe may predict the existence of stars, galaxies, planets,
>and maybe even living organisms, but normally won't treat emergent
>characteristics that lead to things like computers and trombones.  If
>you're trying to build a laser cannon to protect your house from
>meteors, it's one thing to calculate the number of stray asteroids
>whose trajectories might randomly intersect your house and build your
>laser accordingly, but a really good defense system has to allow that
>someone may be aiming stuff at you deliberately.  Similarly, a really
>good PRNG has to resist deliberate attempts to predict its output or
>even to notice that the output isn't perfectly random.
>
>The modern definition of a good PRNG basically says there should be no
>polynomial-time algorithm for distinguishing the PRNG from a true
>random source with probability better than chance.  There's a body of
>theoretical CS work that's been done on this subject.  See the
>pseudorandomness chapter of Goldreich's "Foundations of Cryptography" at
>
>   ftp://theory.lcs.mit.edu/pub/people/oded/BookFrag/part3N.ps
>
>if you want to see the details.

These are all good examples of *cryptographic* weaknesses in simple
PRNGs, which is really a different problem than weaknesses which would
lead to an incorrect result in a physical or engineering simulation.
In my earlier debate with the statistical tools salesmen, I was aware
of these cryptographic weaknesses.  I wasn't aware of the "clustering"
kind of phenomena, like the one Tim pointed out, that *could* have an
effect on our circuit simulations.

>Of course nobody has ever been able to demonstrate the existence of a
>good PRNG, since that runs smack into the P vs NP problem.  However,
>there are solid results showing how to build PRNG's from secure
>primitives (e.g. SHA) if we're willing to assume the security claims
>for those primitives.  That seems to be about what the current state
>of the art is.

Looks like the engineering problem should be simpler than the
cryptographic problem, which ultimately comes down to -- can you
generate a sequence so complex that no human mind or computational
process can ever 'decipher' it.  I'm guessing now that if the
distributions appear 'smooth' in every direction, we are probably OK
in the engineering sense.  I'm not going to rely on that guess,
however.  For now, I'll just go with an algorithm like Mersenne
Twister, that lots of smart people have looked at for a long time and
found nothing that would impact an engineering simulation.

-- Dave