Re: [Python-ideas] Should our default random number generator be secure?

[Nathaniel Smith <njs@vorpus.org>]
I think it's worse than that. MT is based on a linear recurrence. Take two streams "far apart" in MT space, and their sum also satisfies the recurrence. So a possible worry about a third stream isn't _just_ about correlation or overlap with the first two streams, but, depending on the app, also about correlation/overlap with the sum of the first two streams. Move to N streams, and there are O(N**2) direct sums to worry about, and then sums of sums, and ... Still won't cause a problem in _my_ statistical life expectancy, but I only have 4 cores ;-)
Uncharitable, but fair :-)
There are some clean and easy approaches to this based on crypto-inspired schemes, but giving up crypto strength for speed. If you haven't read it, this paper is delightful: http://www.thesalmons.org/john/random123/papers/random123sc11.pdf

On Wed, Sep 9, 2015 at 8:35 PM, Tim Peters <tim.peters@gmail.com> wrote:
It really is! As AES acceleration instructions become more common (they're now standard IIUC on x86, x86-64, and even recent ARM?), even just using AES in CTR mode becomes pretty compelling -- it's fast, deterministic, provably equidistributed, *and* cryptographically secure enough for many purposes. (Compared to a true state-of-the-art CPRNG the naive version fails due to lack of incremental mixing, and the use of a reversible transition function. But even these are mostly only important to protect against attackers who have access to your memory -- which is not trivial as heartbleed shows, but still, it's *waaay* ahead of something like MT on basically every important axis.) -n -- Nathaniel J. Smith -- http://vorpus.org

On 2015-09-10 04:56, Nathaniel Smith wrote:
I'll also recommend the PCG paper (and algorithm) as the author's cross-PRNGs comparisons have been bandied about in this thread already. The paper lays out a lot of the relevant issues and balances the various qualities that are important: statistical quality, speed, and security (of various flavors). http://www.pcg-random.org/paper.html I'm actually not that impressed with Random123. The core idea is nice and clean, but the implementation is hideously complex. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

[This is getting fairly off-topic for python-ideas (since AFAICT there is no particular reason right now to add a new deterministic generator to the stdlib), so CC'ing numpy-discussion and I'd suggest followups be directed to there alone.] On Thu, Sep 10, 2015 at 10:41 AM, Robert Kern <robert.kern@gmail.com> wrote:
I'm curious what you mean by this? In terms of the code, or...? I haven't looked at the code, but the simplest generator they recommend in the paper is literally just def rng_stream(seed): counter = 0 while True: # AES128 takes a 128 bit key and 128 bits of data and returns 128 bits of encrypted data val = AES128(key=seed, data=counter) yield low_64_bits(val) yield high_64_bits(val) counter += 1 which gives a 64-bit generator with a period of 2^129. They benchmark it as faster than the Mersenne Twister on modern CPUs (<2 cycles-per-byte on recent x86, x86-64, ARMv8), it requires less scratch space, is incredibly simple to work with -- you can parallelize it, get independent random streams, etc., in a totally trivial way -- and has a *way* stronger guarantee of random-looking-ness than merely passing TestU01. The downsides are that it does still require 176 bytes of read-only scratch storage (used to cache an expanded version of the "key"), the need for a modern CPU to get that speed, and that it doesn't play well with GPUs. So they also provide a set of three more ad hoc generators designed to solve these problems. I'm not as convinced about these, but hey, they pass TestU01... The PCG paper does a much better job of all the other stuff *around* making a good RNG -- it has the nice website, clear comparisons, nice code, etc. -- which is definitely important. But to me the increase in speed and reduction in memory use doesn't seem worth it given how fast these generators are to start with + the nice properties of counter mode + and cryptographic guarantees of randomness that you get from AES, for code that's generally targeting non-embedded non-GPU systems. -n -- Nathaniel J. Smith -- http://vorpus.org

[Robert Kern <robert.kern@gmail.com>]
[Nathaniel Smith <njs@pobox.com>]
I assume it's because if you expand what's required to _implement_ AES128() in C, it does indeed look pretty hideously complex. On HW implementing AES primitives, of course the code can be much simpler. But to be fair, if integer multiplication and/or addition weren't implemented in HW, and we had to write to C code emulating them via bit-level fiddling, the code for any of the PCG algorithms would look hideously complex too. But being fair isn't much fun ;-)

On Wed, Sep 9, 2015 at 8:35 PM, Tim Peters <tim.peters@gmail.com> wrote:
It really is! As AES acceleration instructions become more common (they're now standard IIUC on x86, x86-64, and even recent ARM?), even just using AES in CTR mode becomes pretty compelling -- it's fast, deterministic, provably equidistributed, *and* cryptographically secure enough for many purposes. (Compared to a true state-of-the-art CPRNG the naive version fails due to lack of incremental mixing, and the use of a reversible transition function. But even these are mostly only important to protect against attackers who have access to your memory -- which is not trivial as heartbleed shows, but still, it's *waaay* ahead of something like MT on basically every important axis.) -n -- Nathaniel J. Smith -- http://vorpus.org

On 2015-09-10 04:56, Nathaniel Smith wrote:
I'll also recommend the PCG paper (and algorithm) as the author's cross-PRNGs comparisons have been bandied about in this thread already. The paper lays out a lot of the relevant issues and balances the various qualities that are important: statistical quality, speed, and security (of various flavors). http://www.pcg-random.org/paper.html I'm actually not that impressed with Random123. The core idea is nice and clean, but the implementation is hideously complex. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

[This is getting fairly off-topic for python-ideas (since AFAICT there is no particular reason right now to add a new deterministic generator to the stdlib), so CC'ing numpy-discussion and I'd suggest followups be directed to there alone.] On Thu, Sep 10, 2015 at 10:41 AM, Robert Kern <robert.kern@gmail.com> wrote:
I'm curious what you mean by this? In terms of the code, or...? I haven't looked at the code, but the simplest generator they recommend in the paper is literally just def rng_stream(seed): counter = 0 while True: # AES128 takes a 128 bit key and 128 bits of data and returns 128 bits of encrypted data val = AES128(key=seed, data=counter) yield low_64_bits(val) yield high_64_bits(val) counter += 1 which gives a 64-bit generator with a period of 2^129. They benchmark it as faster than the Mersenne Twister on modern CPUs (<2 cycles-per-byte on recent x86, x86-64, ARMv8), it requires less scratch space, is incredibly simple to work with -- you can parallelize it, get independent random streams, etc., in a totally trivial way -- and has a *way* stronger guarantee of random-looking-ness than merely passing TestU01. The downsides are that it does still require 176 bytes of read-only scratch storage (used to cache an expanded version of the "key"), the need for a modern CPU to get that speed, and that it doesn't play well with GPUs. So they also provide a set of three more ad hoc generators designed to solve these problems. I'm not as convinced about these, but hey, they pass TestU01... The PCG paper does a much better job of all the other stuff *around* making a good RNG -- it has the nice website, clear comparisons, nice code, etc. -- which is definitely important. But to me the increase in speed and reduction in memory use doesn't seem worth it given how fast these generators are to start with + the nice properties of counter mode + and cryptographic guarantees of randomness that you get from AES, for code that's generally targeting non-embedded non-GPU systems. -n -- Nathaniel J. Smith -- http://vorpus.org

[Robert Kern <robert.kern@gmail.com>]
[Nathaniel Smith <njs@pobox.com>]
I assume it's because if you expand what's required to _implement_ AES128() in C, it does indeed look pretty hideously complex. On HW implementing AES primitives, of course the code can be much simpler. But to be fair, if integer multiplication and/or addition weren't implemented in HW, and we had to write to C code emulating them via bit-level fiddling, the code for any of the PCG algorithms would look hideously complex too. But being fair isn't much fun ;-)
participants (3)
-
Nathaniel Smith
-
Robert Kern
-
Tim Peters