[ python-Bugs-917055 ] add a stronger PRNG

SourceForge.net noreply at sourceforge.net
Sat Apr 10 00:01:14 EDT 2004

Bugs item #917055, was opened at 2004-03-16 02:46
Message generated for change (Comment added) made by jhylton
You can respond by visiting: 

Category: Python Library
Group: Feature Request
Status: Open
Resolution: None
Priority: 5
Submitted By: paul rubin (phr)
Assigned to: Raymond Hettinger (rhettinger)
Summary: add a stronger PRNG

Initial Comment:
The default Mersenne Twister algorithm in the Random
module is very fast but makes no serious attempt to
generate output that stands up to adversarial analysis.
 Besides cryptography applications, this can be a
serious problem in areas like computer games.  Sites
like www.partypoker.com routinely run online
tournaments with prize funds of 100K USD or more. 
There's big financial incentives to find ways of
guessing your opponent's cards with better than random
chance probability.  See bug #901285 for some
discussion of possible correlations in Mersenne
Twister's output.

Teukolsky et al discuss PRNG issues at some length in
their book "Numerical Recipes".  The original edition
of Numerical Recipes had a full blown version of the
FIPS Data Encryption Standard implemented horrendously
in Fortran, as a way of making a PRNG with no easily
discoverable output correlations.  Later editions
replaced the DES routine with a more efficient one
based on similar principles.

Python already has an SHA module that makes a dandy
PRNG.  I coded a sample implementation:


I'd like to ask that the Python lib include something
like this as an alternative to MT.  It would be similar
to the existing whrandom module in that it's an
alternative subclass to the regular Random class.  The
existing Random module wouldn't have to be changed.

I don't propose directly including the module above,
since I think the Random API should also be extended to
allow directly requesting pseudo-random integers from
the generator subclass, rather than making them from
floating-point output.  That would allow making the
above subclass work more cleanly.  I'll make a separate
post about this, but first will have to examine the
Random module source code.


>Comment By: Jeremy Hylton (jhylton)
Date: 2004-04-10 04:01

Logged In: YES 

Much earlier in the discussion Raymond wrote: 
"The docs *must* include references indicating the strengths and
weaknesses of the approach.  It should also concisely say
why it works (a summary proof that makes it clear how a
one-way digest function can be tranformed into a sequence
generator that is cryptographicly strong to both the left
and right with the latter being the one that is not obvious)."

I don't see any documentation of this sort in the current
patch. I also think it would be helpful if the documentation
made some mention of why this generator would be useful.  In
particular, I suspect some users may by confused by the
mention of SHA and be lead to believe that this is CSPRNG,
when it is not; perhaps a mention of Yarrow and other
approaches for cryptographic applications would be enough to


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-04-10 03:24

Logged In: YES 

Thanks for the detailed comments.

1) Will add the references we have but need to make it clear
that this exact implementation is not studied and does not
guarantee cryptographic security.

2) The API is clear, seed() overwrites and jumpahead()
updates.  Besides,  the goal is to provide a good
alternative random number generator.  If someone needs real
crypto, they should use that.   Tossing in ad hoc pieces to
"make it harder" is a sure sign of venturing too far from
theoretical underpinnings.

3) Good observation.  I don't think a change is necessary. 
The docs do not promise that asking for 160 gives the same
as 96 and 64 back to back.  The Mersenne Twister has similar

4)  Let's not gum up the API because we have encryption and
text API's on the brain.  The module is about random numbers
not byte strings.  


Comment By: Trevor Perrin (trevp)
Date: 2004-04-10 00:52

Logged In: YES 

Hi Raymond, here's some lengthy though not critically 
important comments (i.e, I'm okay with the latest diff).

1) With regards to this documentation: 
"No references were found which document the properties of 
SHA-1 as a random number generator".  

I can't find anything that documents exactly what we're 
doing.  This type of generator is similar to Bruce Schneier's 
Yarrow and its offspring (Fortuna and Tiny).  However, those 
use block-ciphers in counter mode, instead of SHA-1.  
According to the recent Tiny paper:

"Cryptographic hash functions can also be a good foundation 
for a PRNG.  Many constructs have used MD5 or SHA1 in this 
capacity, but the constructions are often ad-hoc.  When 
using a hash function, we would recommend HMAC in CTR 
mode (i.e., one MACs counter for each successive output 
block).  Ultimately, we prefer the use of block ciphers, as 
they are generally better-studied constructs."

Using HMAC seems like overkill to me, and would slow things 
down.  However, if there's any chance Python will add block 
ciphers in the future, it might be worth waiting for, so we 
could implement one of the well-documented block cipher 

2) Most cryptographic PRNGs allow for mixing new entropy 
into the generator state.  The idea is you harvest entropy in 
the background, and once you've got a good chunk (say 
128+ bits) you add it in.  This makes cryptanalysis of the 
output harder, and allows you to recover even if the 
generator state is compromised.

We could change the seed() method so it updates the state 
instead of overwriting it:

def __init__(self):
	self.cnt = 0
	self.s0 = '\0' * 20
	self.gauss_next = None

def seed(self, a=None):
	if a is None:
		# Initialize from current time
		import time
		a = time.time()
	b = sha.new(repr(a)).digest()		
	self.s0 = sha.new(self.s0 + b).digest()

'b' may not be necessary, I'm not sure, though it's similar to 
how some other PRNGs handle seed inputs.  If we were using 
a block cipher PRNG, it would be more obvious how to do this.

jumpahead() could also be used instead of seed().

3) The current generators (not just SHA1Random) won't 
always return the same sequence of bits from the same 
state.  For example, if I call SHA1Random.getrandbits() asking 
for 160 bits they'll come from the same block, but if I ask for 
96 and 64 bits, they'll come from different blocks.  

I suggest buffering the output, so getting 160 bits or 96+64 
gets the same bits.  Changing getrandbits() to getrandbytes
() would avoid the need for bit-level buffering.  

4) I still think a better interface would only require new 
generators to return a byte string.  That would be easier for 
SHA1Random, and easier for other generators based on cross-
platform entropy sources.  I.e., in place of random() and 
getrandbits(), SHA1Random would only have to implement:

def getrandbytes(self, n):
	while len(buffer) < n:
		self.cnt += 1
		self.s0 = sha.new(self.s0 + hex
		self.buffer += sha.new(self.s0).digest()
	retVal = self.buffer[:n]
	self.buffer = self.buffer[n:]
	return retVal
The superclass would call this to get the required number of 
bytes, and convert them as needed (for converting them to 
numbers it could use the 'long(s, 256)' patch I submitted.

Besides making it easier to add new generators, this would 
provide a useful function to users of these generators.  
getrandbits() is less useful, and it's harder to go from a long-
integer to a byte-string than vice versa, because you may 
have to zero-pad if the long-integer is small.


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-04-09 03:41

Logged In: YES 

Attaching a revised patch.  If there are no objections, I will 
add it to the library (after factoring the unittests and adding 


Comment By: Trevor Perrin (trevp)
Date: 2004-04-07 06:56

Logged In: YES 

Comments on shardh.py -

SHA1Random2 seems more complex than it needs to be.  
Comparing with figure 19 of [1], I believe that s1 does not 
need to be kept as state, so you could replace this:
  self.s1 = s1 = sha.new(s0 + self.s1).hexdigest()
with this:
  s1 = sha.new(s0).hexdigest()
If there's concern about the low hamming-distance between 
counter values, you could simply hash the counter before 
feeding it in (or use M-T instead of the counter).

Instead of updating s0 every block, you could update it every 
10th block or so.  This would slightly increase the number of 
old values an attacker could recover, upon compromising the 
generator state, but it could be a substantial speedup.

SHA1Random1 depends on M-T for some of its security 
properties.  In particular, if I discover the generator state, 
can I run it backwards and determine previous values?  I 
don't know, it depends on M-T.  Unless we know more about 
the properties of M-T, I think it would be preferable to use M-
T only in place of the counter in the SHA1Random2 
construction (if at all), *NOT* as the sole repository of PRNG 

[1] http://www.cypherpunks.to/~peter/06_random.pdf


Comment By: paul rubin (phr)
Date: 2004-04-01 02:48

Logged In: YES 

FYI, this was just announced on the python-crypto list. 
It's a Python wrapper for EGADS, a cross platform
entropy-gathering RNG.  I haven't looked at the code for it
and have no opinion about it.



Comment By: paul rubin (phr)
Date: 2004-03-26 00:36

Logged In: YES 

1. OK, though the docs should mention this.

2. One-way just means collision resistant, and we're trying
to make something without a distinguisher, which is a
stronger criterion.  I'm not saying there's any problem with
a single hash; I just don't see an obvious proof that
there's not a problem.  Also it makes me cringe slightly to
keep the seed around as plaintext even temporarily.  The PC
that I'm using (midrange Athlon) can do about a million sha1
hashes per second, so an extra hash per reseed "multiple"
times per second shouldn't be noticible, for reasonable
values of "multiple".

3. Since most of the real computation of this function is in
the sha1 compression, the timing indicates that evaluation
is dominated by interpreter overhead rather than by hashing.
 I presume you're using CPython.  The results may be
different in Jython or with Psyco and different again under
PyPy once that becomes real.  I think we should take the
view that we're designing a mathematical function that
exists independently of any specific implementation, then
figure out what characteristics it should have and implement
those, rather than tailoring it to the peculiarities of

If people are really going to be using this function in
2010, CPython will hopefully be dead and gone (replaced by
PyPy) by then, so that's all the more reason to not worry
about small CPython-specific effects since the function will
outlast CPython.  Maybe also sometime between now and then,
these libraries can be compiled with psyco.

4. OK

5. OK.  Would be good to also change %s for cnt in setstate
to %x.

6. Synchronization can be avoided by hashing different fixed
strings into s0 and s1 at each rehash (I did that in my
version).  I think it's worth doing that just to kick the
hash function away from standard sha.  I actually don't see
much need for the counter in either hash, but you were
concerned about hitting possible short cycles in sha.

7. OK.  WHrandom is already non-threadsafe, so there's
precedent.   I do have to wonder if the 160 bit arithmetic
is slowing things down.  If we don't care about non-IEEE
doubles, we're ok with 53 bits.   Hmm, I also wonder whether
the 160 bit int to float conversion is precisely specified
even for IEEE and isn't an artifact of Python's long int
implementation.  But I guess it doesn't matter, since we're
never hashing those floats.

Re bugs til 2010: oh let's have more confidence than that
:).  I think if we're careful and get the details correct
before deployment, we shouldn't have any problems.  This is
just one screenful of code or so, not complex by most
reasonable standards.  However, we might want post the
algorithm on sci.crypt for comments, since there's some
knowledgeable people there.


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-25 09:08

Logged In: YES 

Took another look at #5 and will change str() to hex().


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-25 08:25

Logged In: YES 

1.  We don't care about non IEEE.

2.  One-way is one-way, so double hashing is unnecessary. 
Also, I've fielded bug reports from existing user apps that
re-seed very frequently (multiple times per second).

3.  The implementations reflect the results of timing
experiments which showed that the fastest intermediate
representation was hex.

4.  ['0x0'] is necessary to hanlde the case where n==0. 
int('', 16) raises a ValueError while int('0x0', 16) does not.

5.  random() and getrandbits() do not have to go through the
same intermediate steps (it's okay for one to use hex and
the other to use str) -- speed and space issues dominate. 
0x comes up enough in Python, there is little use in tossing
it away for an obscure, hypothetical micro-controller

6.  Leaving cnt out of the s1 computation guarantees that it
will never track the updates of s0 -- any syncronization
would be a disaster.  Adding a count or some variant smacks
of desperation rather than reliance on proven hash properties.

7.  The function is already 100 times slower than MT. 
Adding locks will make the situation worse.  It is better to
simply document it as being non-threadsafe.  

Look at back at the mt/sha version.  Its code is much
cleaner, faster, and threadsafe.  It goes a long way towards
meeting your 
request and serving as an alternate generator to validate
simulation results.

If we use the sha/sha version, I'm certain that we will be
fielding bug reports on this through 2010.  It is also
sufficiently complex that it will spawn lengthy, wasteful
discussions and it will create a mine-field for future


Comment By: paul rubin (phr)
Date: 2004-03-25 05:15

Logged In: YES 

I'm not sure why the larger state space of mt/sha1 is
supposed to be an advantage, but the other points are
reasonable.  I like the new sha1/sha1 implementation except
for a few minor points (below).  I made the old version
hopefully threadsafe by adding a threading.Lock() object to
the instance and locking it on update. I generally like your
version better so maybe the lock can be added.  Of course
that slows it down even further, but in the context of an
interpreted Python program I feel that the generator is
still fast enough compared to the other stuff the program is
likely to be doing.

Re the new attachment, some minor issues:

1) The number 2.0**-160 is < 10**-50.  This is a valid IEEE
double but on some non-IEEE machines it may be a floating
underflow or even equal to zero.  I don't know if this matters.

2) Paranoia led me to hash the seed twice in the seed
operation in my version, to defeat unlikely
message-extension attacks against the hash function.  I
figured reseeding is infrequent enough that an extra hash
operation doesn't matter.

3) Storing s1 as a string of 40 hex digits in SHARandom2
means that s1+s2 is 60 characters, which means hashing it
will need two sha1 compression operations, slowing it down some.

4) the intiialization of ciphertxt to ["0x0"] instead of []
doesn't seem to do anything useful.  int('123abc', 16) is
valid without the 0x prefix.

5) random() uses hex(cnt) while getrandbits uses str(cnt)
(converting to decimal instead of hex).  I think it's better
to use hex and remove the 0x prefix from the output, which
is cleanest, and simpler to implement on some targets
(embedded microcontroller).  The same goes for the %s
conversion in jumpahead (my version also uses %s there).

6) It may be worthwhile to include cnt in both the s0 and s1
hash updates.  That guarantees the s1 hash never gets the
same input twice.

7) The variable "s1" in getrandbits (instead of self.s1) is
set but never used.

Note in my version of sharandom.py, I didn't put a thread
lock around the tuple assignment in setstate().  I'm not
sure if that's safe or not.  But it looks to me like
random.py in the CVS does the same thing, so maybe both are


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-24 11:31

Logged In: YES 

I attached a new version of the mt/sha1 combination.

Here are the relative merits as compared sha1/sha1 approach:
* simpiler to implement/maintain since state tracking is builtin
* larger state space (2**19937 vs 2**160)
* faster
* threadsafe

Favoring sha1/sha1:
* uses only one primitive
* easier to replace in situations where MT is not available


Comment By: paul rubin (phr)
Date: 2004-03-22 00:56

Logged In: YES 

Two small corrections to below: 1) "in favor of an entropy"
is an editing error--the intended meaning should be obvious.
 2) I meant Bryan Mongeau, not Eric Mongeau.  Bryan's lib is
at <http://eevolved.com/cryptkit/>.  Sorry for any confusion.


Comment By: paul rubin (phr)
Date: 2004-03-22 00:49

Logged In: YES 

I'm very much in favor of an entropy and Bram's suggested
interface of entropy(x) supplying x bytes of randomness is
fine.  Perhaps it should really live in a cryptography API
rather than in the Random API, but either way is ok.  Mark
Moraes wrote a Windows module that gets random numbers from
the Windows CAPI; I put up a copy at


For Linux, Cygwin, and *BSD (including MacOS 10, I think),
just read /dev/urandom for random bytes.  However, various
other systems (I think including Solaris) don't include
anything like this.  OpenSSL has an entropy gathering daemon
that might be of some use in that situation.  There's also
the Yarrow generator (http://www.schneier.com/yarrow.html)
and Eric Mongeau(?) wrote  a pure-Python generator a while
back that tried to gather entropy from thread racing,
similar to Java's default SecureRandom class (I consider
that method to be a bit bogus in both Python and Java).

I think, though, simply supporting /dev/*random and the
Windows CAPI is a pretty good start, even if other OS's
aren't supported.  Providing that function in the Python lib
will make quite a few people happy.  A single module
integrating both methods would be great.  I don't have any
Windows dev tools so can't test any wrappers for Mark
Moraes's function but maybe one of you guys can do it.

I'm not too keen on the md5random.py patch for reasons
discussed in the c.l.py thread.  It depends too deeply on
the precise characteristics of both md5 and Mersenne
Twister.  I think for a cryptography-based generator, it's
better to stick to one cryptography-based primitive, and to
use sha instead of md5.  That also helps portability since
it means other environments (maybe including Jython) can
reproduce the PRNG stream without having to re-implement MT,
as long as they have SHA (which is a US federal standard).


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-21 19:50

Logged In: YES 

Bram, if you have a patch, I would be happy to look at it. 
Please make it as platform independent as possible (its okay
to have it conditionally compile differently so long as the
API stays the same).  
Submit it as a separate patch and assign to me -- it doesn't
have to be orginal, you can google around to determine prior


Comment By: Bram Cohen (bram_cohen)
Date: 2004-03-21 19:34

Logged In: YES 

The lack of a 'real' entropy source is the gap which can't
be fixed with an application-level bit of code. I think
there are simple hooks for this on all systems, such as
/dev/random on linux, but they aren't cross platform. A
unified API which always calls the native entropy hook would
be a very good thing.

An example of a reasonable API would be to have a module
named entropy, with a single function entropy(x) which
returns a random string of length x. This is a problem which
almost anyone writing a security-related application runs
into, and lots of time is wasted writing dubious hacks to
harvest entropy when a single simple library could magically
solve it the right way.


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-21 17:33

Logged In: YES 

Attaching my alternative.  If it fulfills your use case, let
me know and I'll apply it.


Comment By: paul rubin (phr)
Date: 2004-03-18 08:51

Logged In: YES 

Updated version of sharandom.py is at same url.  It folds a
counter into the hash and also includes a getrandbits method.


Comment By: paul rubin (phr)
Date: 2004-03-18 06:27

Logged In: YES 

I don't mind dropping the time() auto-seed but I think that
means eliminating any auto-seed, and requiring a
user-supplied seed.  

There is no demonstrable minimum period for the SHA-OFB and
it would be bad if there was, since it would then no longer
act like a PRF.  Note that the generator in the sample code
actually comes from two different SHA instances and thus its
expected period is about 2**160.  Anyway, adding a simple
counter (incrementing by 1 on every SHA call) to the SHA
input removes any lingering chance of a repeating sequence.
 I'll update the code to do that.  It's much less ugly than
stirring in Mersenne Twister output.

I don't have Python 2.4 yet but will look at it when I get a


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-17 11:27

Logged In: YES 

It has some potential for being added.  That being said, try
to avoid a religious fervor.

I would like to see considerably more discussion and
evolution first.

The auto-seeding with time *must* be dropped -- it is not
harmonious with the goal of creating a secure random
sequence.  It is okay for the a subclass to deviate in this way.

Also, I was soliciting references stronger than (I don't
know of any results ... It is generally considered ... ). 
If we put this in, people are going to rely on it.  The docs
*must* include references indicating the strengths and
weaknesses of the approach.  It should also concisely say
why it works (a summary proof that makes it clear how a
one-way digest function can be tranformed into a sequence
generator that is cryptographicly strong to both the left
and right with the latter being the one that is not obvious).

Not having a demonstrable minimum period is also bad. 
Nothing in the  discussion so far precludes the existence of
a bad seed that has a period of only 1 or 2.  See my
suggestion on comp.lang.python for a means of mitigating
this issue.

With respect to the randint question, be sure to look at the
current Py2.4 source for random.py.  The API is expanded to
include and an option method, getrandbits().  That in turn
feeds the other integer methods without the int to float to
int dance.

Upon further consideration, I think the export control
question is moot since we're using an existing library
function and not really expressing new art.


Comment By: paul rubin (phr)
Date: 2004-03-17 09:05

Logged In: YES 

There are no research results that I know of that can
distinguish the output of sha1-ofb from random output in any
practical way.  To find such a distinguisher would be a
significant result.  It's a safe bet that cryptographers
have searched for distinguishers, though I don't promise to
have heard of every result that's been published.  I'll ask
on sci.crypt if anyone else has heard of such a thing, if
you want.  SHA1-HMAC is generally considered to be
indistinguishable from a PRF (pseudorandom function, i.e. a
function selected at random from the space of all functions
from arbitrary strings to 160-bit outputs; this term is from
concrete security theory).  MD5 is deprecated these days and
there's no point to using it instead of sha1 for this.

I'm not sure what happens if randint is added to the API. 
If you subclass Random and don't provide a randint method,
you inherit from the base class, which can call
self.random() to get floats to make the ints from.

US export restrictions basically don't exist any more.  In
principle, if you want to export something, you're supposed
to send an email to an address at the commerce department,
saying the name of the program and the url where it can be
obtained and a few similar things.  In practice, email to
that address is ignored, they never check anything.  I heard
the address even stopped working for a while, though they
may have fixed it since then.  See
http://www.bxa.doc.gov/Encryption/ for info.  I've emailed
notices to the address a few times and never heard back
anything.  Anyway, I don't think this should count as
cryptography; it's simply using a cryptographic hash
function as an PRNG to avoid the correlations in other
PRNG's; scientific rationale for doing that is given in the
Numerical Recipes book mentioned above.

The code that I linked uses the standard API but I wonder if
the floating point output is optimally uniform, i.e. the
N/2**56 calculation may not be exactly the right thing for
an ieee float64.

Using the time of day is what the Random docs say to do by
default.  You're correct that any security application needs
to supply a higher entropy seed.

I would like it very much if the std lib included a module
that read some random bytes from the OS for OS's that
support it.  That means reading /dev/urandom on recent
Un*x-ish systems or Cygwin, or calling CryptGenRandom on
Windows.  Reading /dev/urandom is trivial, and there's a guy
on the pycrypt list who wrote a Windows function to call
CryptGenRandom and return the output through the Python API.
 I forwarded the function to Guido with the author's
permission but nothing seemed to happen with it.  However,
this gets away from this sharandom subclass.  I'd like to
make a few more improvements to the module but after that I
think it can be dropped into the lib.  Let me know what you


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-16 11:52

Logged In: YES 

One other thought:  if cryptographic strength is a goal, then 
seeding absolutely should require a long seed (key) as an 
input and the time should *never* be used.


Comment By: Raymond Hettinger (rhettinger)
Date: 2004-03-16 11:49

Logged In: YES 

It would have been ideal if the Random API had been 
designed with an integer generator at the core and floating 
point as a computed value, but that it the way it has been for 
a long time and efforts to switch it over would likely result in 
either incompatibility with existing subclasses or introducing 
new complexity (making it even harder to subclass).  I think 
the API should be left alone until Py3.0.

The attached module would make a good recipe on ASPN 
where improvements and critiques can be posted.  Do you 
have links to research showing that running SHA-1 in a 
cipher block feedback mode results in a cryptographically 
strong random number generator -- the result seems likely, 
but a research link would be great.  Is there a link to 
research showing the RNG properties of the resulting 
generator (period, equidistribution, passing tests for 
randomness, etc)?  Also, is there research showing the 
relative merits of this approach vs MD5, AES, or DES?

If something like this gets added to the library, I prefer it to 
be added to random.py using the existing API.  Adding yet 
another random module would likely do more harm than 

One other question (I don't know the answer to this):  would 
including a cryptographically strong RNG trigger US export 
restrictions on the python distribution?


You can respond by visiting: 

More information about the Python-bugs-list mailing list