SHA-based subclass for random module

Wed Mar 24 05:12:11 EST 2004

Josiah Carlson wrote:

> 
>> If the generator is something like /dev/urandom or Win32's 
>> CryptGenRandom, you don't want the repeated system call overhead of 
>> dipping into it 32 bits at a time.
>> [...]
> 
> Do like a standard file read does, buffer some extra bytes for the 
> future, 4k should be sufficient for most needs.

That's doable.  But then, for large strings the loop you presented would 
be slow.

The caller could read bigger chunks at a time.  But then every subclass 
has to implement getrandbits() for large integers, which is 
unnecessarily complicated and an unnecessary detour from byte strings to 
longs and back again for subclasses where the data starts as strings.

Unless you're saying some subclasses could implement getrandbits(), and 
some getrandstring(), and the base class would synthesize whichever is 
missing?

[..]
> As provided, using bits or bytes, it is easy to convert either way. 

You're right, conversion isn't hard, so this isn't a big deal.  I just 
think it's neater my way :-).

(Conversion would be even easier if Python had a built-in way to convert 
byte strings to longs.  Maybe something like:

bigNumber = long(bigNumberString, 256)

bigNumberString = bigNumber.tostring()

It would use network-byte order (big-endian).  At least, that's what's 
used for cryptography, and ASN.1, and the bignum libraries I've seen.)

> While there are sources that are better suited for generating bytes 
> (which you provide), there are entropy producers that give bits at a 
> time.  If I remember correctly, there is a serial port, USB, processor 
> and even chipset entropy generators that all produce bits (some faster 
> than others).

I bet the APIs for these sources return some number of bytes.

Having to return bits is ungainly cause what if the caller wants, say, 7 
of them?  You gotta buffer the excess bit, and then tack it on to the 
next call.

(Actually, the current getrandbits() doesn't do that, it just discards 
excess bits.  So with 2 generators in the same state, calling 
getrandbits(32) and getrandbits(16) twice won't get the same bits. 
Doesn't seem right, exactly).

>> [..]The only benefit 
>> of MT vs. a SHA-based generator is speed.
> 
> 
> I could have sworn the other was that MT had cycle that seemed to be 
> provably longer than any pure SHA based generator.

I don't think that's worth worrying about: even if you're feeding-back 
the SHA values, it shouldn't cycle till around 2^80, which is huge, and 
you should be re-seeding periodically anyways.  You can also use a 
counter instead of feeding back values.

(but I'm no expert, and I haven't been following that discussion, so 
don't take this seriously).

Trevor