[Python-ideas] Should our default random number generator be secure?

Wed Sep 9 23:02:19 CEST 2015

On Sep 9, 2015 12:21 PM, "Tim Peters" <tim.peters at gmail.com> wrote:
>
> [Steven D'Aprano <steve at pearwood.info>]
> > ...
> > Question, aimed at anyone, not necessarily random832 -- one desirable
> > property of PRNGs is that you can repeat a sequence of values if you
> > re-seed with a known value. Does arc4random keep that property? I think
> > that it is important that the default RNG be deterministic when given a
> > known seed. (I'm happy for the default seed to be unpredictable.)
>
> "arc4random" is ill-defined.  From what I gathered, it's the case that
> "pure chacha" variants can be seeded to get a reproducible sequence
> "in theory", but that not all implementations support that.
>
> Specifically, the OpenBSD implementation being "sold" here does not and
cannot:
>
>
http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man3/arc4random.3
>
> "Does not" because there is no API to either request or set a seed.
>
> "Cannot" because:
>
>     The subsystem is re-seeded from the kernel random number
>     subsystem using getentropy(2) on a regular basis

Another reason why it is important *not* to provide a seeding api for a
crypto rng is that this means you can later swap out the underlying
algorithms easily as the state of the art improves. By contrast, if you
have a deterministic seeded mode, then swapping out the algorithm becomes a
compatibility break.

(You can provide a "mix this extra entropy into the pool" api, which looks
rather similar to seeding, but has fundamentally different semantics.)

The only real problem that I see with switching the random module to use a
crypto rng is exactly this backwards compatibility issue. For scientific
users, reproducibility of output streams is really important.

(Ironically, this is a variety of "important" that crypto people are very
familiar with: universally acknowledged to be the right thing by everyone
who's thought about it, a minority do religiously and rely on, and most
people ignore out of ignorance. Education is ongoing...)

OTOH python has never made strong guarantees of output stream
reproducibility -- 3.2 broke all seeds by default (you have to add
'version=1' to your seed call to get the same results on post-3.2 pythons
-- which of course gives an error on older versions). And 99% of the
methods are documented to be unstable across versions -- the only method
that's guaranteed to produce reproducible results across versions is
random.random(). In practice the other methods usually don't change so
people get away with it, but. See:

    https://docs.python.org/3/library/random.html#notes-on-reproducibility

So in practice the stdlib random module is not super suitable for
scientific work anyway. Not that this stops anyone from using it for this
purpose... see above. (And to be fair even this limited determinism is
still enough to be somewhat useful -- not all projects require
reproducibility across years of different python versions.) Plus even a lot
of people who know about the importance of seeding don't realize that the
stdlib's support has these gotchas.

(Numpy unsurprisingly puts higher priority on these issues -- our random
module guarantees exact reproducibility of seeded outputs modulo rounding,
across versions and systems, except for bugfixes necessary for correctness.
This means that we carry around a bunch of old inefficient implementations
of the distribution methods, but so be it...)

So, all that considered: I can actually see an argument for removing the
seeding methods from the the stdib entirely, and directing those who need
reproducibility to a third party library like numpy (/ pygsl / ...). This
would be pretty annoying for some cases where someone really does have
simple needs and wants just a little determinism without a binary
extension, but on net it might even be an improvement, given how easy it is
to misread the current api as guaranteeing more than it actually promises.
OTOH this would actually break the current promise, weak as it is.

Keeping that promise in mind, an alternative would be to keep both
generators around, use the cryptographically secure one by default, and
switch to MT when someone calls

  seed(1234, generator="INSECURE LEGACY MT")

But this would justifiably get us crucified by the security community,
because the above call would flip the insecure switch for your entire
program, including possibly other modules that were depending on random to
provide secure bits.

So if we were going to do this then I think it would have to be by
switching the global RNG over unconditionally, and to fulfill the promise,
provide the MT option as a separate class that the user would have to
instantiate explicitly if they wanted it for backcompat. Document that you
should replace

import random
random.seed(12345)
if random.whatever(): ...

with

from random import MTRandom
random = MTRandom(12345)
if random.whatever(): ...

As part of this transition I would also suggest making the seed method on
non-seedable RNGs raise an error when given an explicit seed, instead of
silently doing nothing like the current SystemRandom.

-n
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150909/faae4eb3/attachment-0001.html>