SHA-based subclass for random module
Trevor Perrin
trevp_spam at trevp.net
Wed Mar 24 05:12:11 EST 2004
Josiah Carlson wrote:
>
>> If the generator is something like /dev/urandom or Win32's
>> CryptGenRandom, you don't want the repeated system call overhead of
>> dipping into it 32 bits at a time.
>> [...]
>
> Do like a standard file read does, buffer some extra bytes for the
> future, 4k should be sufficient for most needs.
That's doable. But then, for large strings the loop you presented would
be slow.
The caller could read bigger chunks at a time. But then every subclass
has to implement getrandbits() for large integers, which is
unnecessarily complicated and an unnecessary detour from byte strings to
longs and back again for subclasses where the data starts as strings.
Unless you're saying some subclasses could implement getrandbits(), and
some getrandstring(), and the base class would synthesize whichever is
missing?
[..]
> As provided, using bits or bytes, it is easy to convert either way.
You're right, conversion isn't hard, so this isn't a big deal. I just
think it's neater my way :-).
(Conversion would be even easier if Python had a built-in way to convert
byte strings to longs. Maybe something like:
bigNumber = long(bigNumberString, 256)
bigNumberString = bigNumber.tostring()
It would use network-byte order (big-endian). At least, that's what's
used for cryptography, and ASN.1, and the bignum libraries I've seen.)
> While there are sources that are better suited for generating bytes
> (which you provide), there are entropy producers that give bits at a
> time. If I remember correctly, there is a serial port, USB, processor
> and even chipset entropy generators that all produce bits (some faster
> than others).
I bet the APIs for these sources return some number of bytes.
Having to return bits is ungainly cause what if the caller wants, say, 7
of them? You gotta buffer the excess bit, and then tack it on to the
next call.
(Actually, the current getrandbits() doesn't do that, it just discards
excess bits. So with 2 generators in the same state, calling
getrandbits(32) and getrandbits(16) twice won't get the same bits.
Doesn't seem right, exactly).
>> [..]The only benefit
>> of MT vs. a SHA-based generator is speed.
>
>
> I could have sworn the other was that MT had cycle that seemed to be
> provably longer than any pure SHA based generator.
I don't think that's worth worrying about: even if you're feeding-back
the SHA values, it shouldn't cycle till around 2^80, which is huge, and
you should be re-seeding periodically anyways. You can also use a
counter instead of feeding back values.
(but I'm no expert, and I haven't been following that discussion, so
don't take this seriously).
Trevor
More information about the Python-list
mailing list