
On Tue, 15 Nov 2022 at 03:16, David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
In concept, what James suggests, is similar to the Random123 library, written at D.E.Shaw Research by my sadly late colleague John Salmon. See https://github.com/DEShawResearch/random123 and the linked academic publications. E.g. https://dl.acm.org/doi/10.1145/2063384.2063405
In general, all PRNGs are deterministic, and by relying on a known seed, the Nth element in a sequence of random numbers can always be reconstructructed. However, if a large number of random numbers are used, certain replication scenarios make the purely sequential nature of generators like Mersenne Twister or linear congruential generators inconvenient.
Counter based pseudo-random generators like Random123 use cryptographic transformations upon counter variables, and were rigorously shown to pass all standard tests of randomness of distribution. However, these tests were done using AES, Threefish, and Philox, and do not automatically apply to SHA256 that James uses. The advantage of these is that they allow direct construction of the Nth element in a pseudo-random sequence without large memory or CPU usage needed to construct the N-1 prior elements.
They don't *automatically* apply, but SHA256 exhibits the avalanche effect that all good cryptographic hashes do, so I would expect that it would have the appropriate distribution. But that would be assuming you're doing something like this: def seed(input): global _RANDOM_SEED, _RANDOM_COUNTER _RANDOM_SEED = hashlib.sha256(str(input).encode("utf-8")) _RANDOM_COUNTER = itertools.count() _seed(time.time()) # or some other default source def random(): hash = _RANDOM_SEED.copy() hash.update(b"%d" % next(_RANDOM_COUNTER)) return int.from_bytes(hash.digest()) / (1<<256) I can't see this happening in the OP's code. And even if it were, I'm not sure how much it would buy you over the current system of random number generation.
That said, Random123 is *done right* by a number of very brilliant people who performed rigorous testing. Something that "seems kinda similar" may or may not be of similar quality (more likely not). FWIW, AES is built-in as a primitive instructruction on most modern general-purpose CPUs, so can be very fast to perform.
Hashing in general is pretty fast on GPUs, so that may or may not actually be significant. But unless someone's actually complaining about the performance of an RNG, I would be more concerned with its features than its CPU cost, including: 1) Reproducibility, if desired (it sometimes is and sometimes isn't) 2) Appropriate distribution and sequence unpredictability 3) Avoidance of side-channel discoverability, particularly if reproducible etc. One thing that IS interesting, when exploring weird and probably suboptimal RNG algorithms, is finding things that can be done to get "kinda okay" randomness when you don't really have a good source of entropy. Let's say you're running a tiny embedded device that needs to report to an HTTPS server with information. How is it to ensure that its SSL sessions are properly secure? Assuming it doesn't have any sort of actual on-board entropy hardware, what's it to do? Sometimes, an imperfect solution is still better than nothing. ChrisA