[ python-Bugs-917055 ] add a stronger PRNG

SourceForge.net noreply at sourceforge.net
Thu Mar 18 01:27:50 EST 2004


Bugs item #917055, was opened at 2004-03-16 02:46
Message generated for change (Comment added) made by phr
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=917055&group_id=5470

Category: Python Library
Group: Feature Request
Status: Open
Resolution: None
Priority: 5
Submitted By: paul rubin (phr)
Assigned to: Raymond Hettinger (rhettinger)
Summary: add a stronger PRNG

Initial Comment:
The default Mersenne Twister algorithm in the Random
module is very fast but makes no serious attempt to
generate output that stands up to adversarial analysis.
 Besides cryptography applications, this can be a
serious problem in areas like computer games.  Sites
like www.partypoker.com routinely run online
tournaments with prize funds of 100K USD or more. 
There's big financial incentives to find ways of
guessing your opponent's cards with better than random
chance probability.  See bug #901285 for some
discussion of possible correlations in Mersenne
Twister's output.

Teukolsky et al discuss PRNG issues at some length in
their book "Numerical Recipes".  The original edition
of Numerical Recipes had a full blown version of the
FIPS Data Encryption Standard implemented horrendously
in Fortran, as a way of making a PRNG with no easily
discoverable output correlations.  Later editions
replaced the DES routine with a more efficient one
based on similar principles.

Python already has an SHA module that makes a dandy
PRNG.  I coded a sample implementation:

http://www.nightsong.com/phr/python/sharandom.py

I'd like to ask that the Python lib include something
like this as an alternative to MT.  It would be similar
to the existing whrandom module in that it's an
alternative subclass to the regular Random class.  The
existing Random module wouldn't have to be changed.

I don't propose directly including the module above,
since I think the Random API should also be extended to
allow directly requesting pseudo-random integers from
the generator subclass, rather than making them from
floating-point output.  That would allow making the
above subclass work more cleanly.  I'll make a separate
post about this, but first will have to examine the
Random module source code.

----------------------------------------------------------------------

>Comment By: paul rubin (phr)
Date: 2004-03-18 06:27

Message:
Logged In: YES 
user_id=72053

I don't mind dropping the time() auto-seed but I think that
means eliminating any auto-seed, and requiring a
user-supplied seed.  

There is no demonstrable minimum period for the SHA-OFB and
it would be bad if there was, since it would then no longer
act like a PRF.  Note that the generator in the sample code
actually comes from two different SHA instances and thus its
expected period is about 2**160.  Anyway, adding a simple
counter (incrementing by 1 on every SHA call) to the SHA
input removes any lingering chance of a repeating sequence.
 I'll update the code to do that.  It's much less ugly than
stirring in Mersenne Twister output.

I don't have Python 2.4 yet but will look at it when I get a
chance.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-17 11:27

Message:
Logged In: YES 
user_id=80475

It has some potential for being added.  That being said, try
to avoid a religious fervor.

I would like to see considerably more discussion and
evolution first.

The auto-seeding with time *must* be dropped -- it is not
harmonious with the goal of creating a secure random
sequence.  It is okay for the a subclass to deviate in this way.

Also, I was soliciting references stronger than (I don't
know of any results ... It is generally considered ... ). 
If we put this in, people are going to rely on it.  The docs
*must* include references indicating the strengths and
weaknesses of the approach.  It should also concisely say
why it works (a summary proof that makes it clear how a
one-way digest function can be tranformed into a sequence
generator that is cryptographicly strong to both the left
and right with the latter being the one that is not obvious).

Not having a demonstrable minimum period is also bad. 
Nothing in the  discussion so far precludes the existence of
a bad seed that has a period of only 1 or 2.  See my
suggestion on comp.lang.python for a means of mitigating
this issue.

With respect to the randint question, be sure to look at the
current Py2.4 source for random.py.  The API is expanded to
include and an option method, getrandbits().  That in turn
feeds the other integer methods without the int to float to
int dance.

Upon further consideration, I think the export control
question is moot since we're using an existing library
function and not really expressing new art.

----------------------------------------------------------------------

Comment By: paul rubin (phr)
Date: 2004-03-17 09:05

Message:
Logged In: YES 
user_id=72053

There are no research results that I know of that can
distinguish the output of sha1-ofb from random output in any
practical way.  To find such a distinguisher would be a
significant result.  It's a safe bet that cryptographers
have searched for distinguishers, though I don't promise to
have heard of every result that's been published.  I'll ask
on sci.crypt if anyone else has heard of such a thing, if
you want.  SHA1-HMAC is generally considered to be
indistinguishable from a PRF (pseudorandom function, i.e. a
function selected at random from the space of all functions
from arbitrary strings to 160-bit outputs; this term is from
concrete security theory).  MD5 is deprecated these days and
there's no point to using it instead of sha1 for this.

I'm not sure what happens if randint is added to the API. 
If you subclass Random and don't provide a randint method,
you inherit from the base class, which can call
self.random() to get floats to make the ints from.

US export restrictions basically don't exist any more.  In
principle, if you want to export something, you're supposed
to send an email to an address at the commerce department,
saying the name of the program and the url where it can be
obtained and a few similar things.  In practice, email to
that address is ignored, they never check anything.  I heard
the address even stopped working for a while, though they
may have fixed it since then.  See
http://www.bxa.doc.gov/Encryption/ for info.  I've emailed
notices to the address a few times and never heard back
anything.  Anyway, I don't think this should count as
cryptography; it's simply using a cryptographic hash
function as an PRNG to avoid the correlations in other
PRNG's; scientific rationale for doing that is given in the
Numerical Recipes book mentioned above.

The code that I linked uses the standard API but I wonder if
the floating point output is optimally uniform, i.e. the
N/2**56 calculation may not be exactly the right thing for
an ieee float64.

Using the time of day is what the Random docs say to do by
default.  You're correct that any security application needs
to supply a higher entropy seed.

I would like it very much if the std lib included a module
that read some random bytes from the OS for OS's that
support it.  That means reading /dev/urandom on recent
Un*x-ish systems or Cygwin, or calling CryptGenRandom on
Windows.  Reading /dev/urandom is trivial, and there's a guy
on the pycrypt list who wrote a Windows function to call
CryptGenRandom and return the output through the Python API.
 I forwarded the function to Guido with the author's
permission but nothing seemed to happen with it.  However,
this gets away from this sharandom subclass.  I'd like to
make a few more improvements to the module but after that I
think it can be dropped into the lib.  Let me know what you
think.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-16 11:52

Message:
Logged In: YES 
user_id=80475

One other thought:  if cryptographic strength is a goal, then 
seeding absolutely should require a long seed (key) as an 
input and the time should *never* be used.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-16 11:49

Message:
Logged In: YES 
user_id=80475

It would have been ideal if the Random API had been 
designed with an integer generator at the core and floating 
point as a computed value, but that it the way it has been for 
a long time and efforts to switch it over would likely result in 
either incompatibility with existing subclasses or introducing 
new complexity (making it even harder to subclass).  I think 
the API should be left alone until Py3.0.

The attached module would make a good recipe on ASPN 
where improvements and critiques can be posted.  Do you 
have links to research showing that running SHA-1 in a 
cipher block feedback mode results in a cryptographically 
strong random number generator -- the result seems likely, 
but a research link would be great.  Is there a link to 
research showing the RNG properties of the resulting 
generator (period, equidistribution, passing tests for 
randomness, etc)?  Also, is there research showing the 
relative merits of this approach vs MD5, AES, or DES?

If something like this gets added to the library, I prefer it to 
be added to random.py using the existing API.  Adding yet 
another random module would likely do more harm than 
good.

One other question (I don't know the answer to this):  would 
including a cryptographically strong RNG trigger US export 
restrictions on the python distribution?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=917055&group_id=5470



More information about the Python-bugs-list mailing list