[Python-ideas] Should our default random number generator be secure?
Tim Peters
tim.peters at gmail.com
Thu Sep 10 05:35:55 CEST 2015
[Nathaniel Smith <njs at vorpus.org>]
> Yeah, the independent-seed-for-each-thread approach works for any RNG, but
> just like people feel better if they have a 100% certified guarantee that
> the RNG output in a single thread will pass through every combination of
> possible values (if you wait some cosmological time), they also feel better
> if there is some 100% certified guarantee that the RNG values in two threads
> will also be uncorrelated with each other.
>
> With something like MT, if two threads did end up with nearby seeds, then
> that would be bad: each thread individually would see values that looked
> like high quality randomness, but if you compared across the two threads,
> they would be identical modulo some lag. So all the nice theoretical
> analysis of the single threaded stream falls apart.
>
> However, for two independently seeded threads to end up anywhere near each
> other in the MT state space requires that you have picked two numbers
> between 0 and 2**19937 and gotten values that were "close". Assuming your
> seeding procedure is functional at all, then this is not a thing that will
> ever actually happen in this universe.
I think it's worse than that. MT is based on a linear recurrence.
Take two streams "far apart" in MT space, and their sum also satisfies
the recurrence. So a possible worry about a third stream isn't _just_
about correlation or overlap with the first two streams, but,
depending on the app, also about correlation/overlap with the sum of
the first two streams. Move to N streams, and there are O(N**2)
direct sums to worry about, and then sums of sums, and ...
Still won't cause a problem in _my_ statistical life expectancy, but I
only have 4 cores ;-)
> So AFAICT the rise of explicitly multi-threaded RNG designs is one of
> those fake problems that exists only so people can write papers about
> solving it. (Maybe this is uncharitable.)
Uncharitable, but fair :-)
> So there exist RNG designs that handle multi-threading explicitly, and it
> shows up on feature comparison checklists. I don't think it should really
> affect Python's decisions at all though.
There are some clean and easy approaches to this based on
crypto-inspired schemes, but giving up crypto strength for speed. If
you haven't read it, this paper is delightful:
http://www.thesalmons.org/john/random123/papers/random123sc11.pdf
More information about the Python-ideas
mailing list