SHA-based subclass for random module

Mon Mar 22 12:04:26 EST 2004

On 21 Mar 2004 15:16:02 -0800, Paul Rubin
<http://phr.cx@NOSPAM.invalid> wrote:

>python at rcn.com (Raymond Hettinger) writes:
[ snip discussion on Pseudo Random Number Generators (PRNG) and
Mersenne Twister (MT) ]
>> No corners were cut.  MT is a state-of-the-art psuedo random number
>> generator.  The problem is that you want encryption.
>
>This sha prng module was motivated partly by your own remark in sf bug
>#901285 that MT output may not be sufficiently correlation-free if you
>take pairs of 16-bit integers from it, in a context having nothing to
>do with encryption.  Anyway, despite how it may sound, I don't have
>any desire to use the random module for encryption (I'd use different
>API's for that).  I just want to be able to write applications that
>use random numbers without having any doubts about whether the random
>numbers are good enough.  Teukolsky et al in "Numerical Recipes",
>which is not a cryptography book, explain that a good way to do that
>is to use cryptographic primitives, so I'm proposing a module that
>does it that way.  
>
>The application in numerics could be something like: I run some
>half-hour simulation using MT, and get results that subjectively seem
>a bit peculiar.  So I run it again using the crypto-based generator
>and maybe that takes three hours, but if I still see similarly
>peculiar-looking results, I now can be pretty confident that it's not
>some consequence of the RNG internals.  It subjectively seems to me
>that the Gnome version of Freecell is easier to play than the Windows
>version (i.e. it deals easier hands), so that's a semi-real-world
>example of this situation.  (In this instance I suspect Freecell is
>using some pathetic linear PRNG or something like that, but I haven't
>yet bothered to look at the code).

This reminds me of a similar discussion I had with a bunch of PhD
mathematicians who wanted to sell my company a better statistics tool.
Their claim was that the tool we were using, which had some simple
PRNG, could give us bad results because of the correlations, short
cycles, etc. that were known to mathematicians.  This turned into a
heated debate, involving our R&D Director and other company
management, with me ( a dumb engineer who only understands simple
statistics ) vs a team of PhDs who could fill the a blackboard with
incomprehesible math proving I was wrong.  We ended up keeping the old
tool *and* buying theirs, when we should have made a clean decision.

I'm still curious as to the answer to my very simple and basic
question:  Are there any examples that show a *non-subjective*
difference in a physically meaningful result between one PRNG and
another?  By "physically meaningful" I mean to include things like
means, standard deviations, correlations between two distributions,
but rule out some measure that would only mean something to a
mathematician.  By "non-subjective" I mean a difference that can be
observed in a repeatable fashion, where the statistical significance
is not always at the edge when you repeat the experiment with more
trials.

The only answer I got from the earlier discussion was a reference to
some paper on a German website, but when I looked, the website was
gone.

-- Dave