
I came back to this thread looking for the list of randomness tests, and I keep missing them somehow. The reason for my follow up is this: The random number function in Python has important uses far beyond my personal concerns, and random.randint(0,9) is supposed to be the Mersenne Twister. I used the attached code to generate the attached file. It’s randint(0,9) for n=600, keyed back in by hand, instead of automatically from the script: My question/concern is this: does this dataset look like it came from the Mersenne Twister? In 600 iterations, it generated 8,8,8; 0,0,0; 1,1,1 and 5,5,5, with more than 1% (6) occurrences where the same number repeated. I didn't do the +1/-1 test, but the data is in a (python) list that should be easily cut and pasted into a legit datafile. This leads me to question whether the MTPRNG rubber is meeting the Link Mint Mate 23.x.x road. "Catastrophizing," if randint isn't correct in the random library, what about the more crucial PRNG's? I suppose that feeding the random function back without keyboard intervention would yield similar results - it's not my data entry either: I printed the number suggested and the number entered for every instance, before the report. This is why my magic eight ball program had problems, not the design of Mersenne Twister. It may be amatuer as h*ll, but doesn't it point something? James (I thank you for so articulately showing that my hash isn't a PRNG.) On Wed, Nov 16, 2022 at 10:27 AM James Johnson <jj126979@gmail.com> wrote:

On Tue, 6 Dec 2022 at 14:00, James Johnson <jj126979@gmail.com> wrote:
Let's see. What are the odds of getting three consecutive digits? First digit could be anything, so that's 1/1; second digit is 1/10, third digit is 1/10. So for any given set of three random decimal digits, there's a 1% chance that they're all the same. In 600 rolls, there are 598 triples (since you could start a triple at any point), but they're not entirely independent. I'd have to do some serious number crunching to determine the exact probabilities involved, but the expected number of triples has to be somewhere between 1% of 200 (counting only those that completely don't overlap) and 1% of 598 (counting everything, regardless of overlap), or somewhere between 2 and 5.98. You got four of them, so without a ton of number crunching, I'd say that that's at least reasonably plausible. Or if I've misread you and you got six of them, that's still right on the edge of the expected range, and the chances of that happening are still quite reasonable. Calculating the probability of seeing six triples is actually quite complicated and I really don't want to go into the details, but I would recommend just trying on a larger pool of numbers. ChrisA

On 6/12/22 3:58 pm, James Johnson wrote:
I came back to this thread looking for the list of randomness tests, and I keep missing them somehow.
If you're interested in testing a PRNG really thoroughly, check out TestU01: http://simul.iro.umontreal.ca/testu01/tu01.html -- Greg

@Chris Indeed the true figure, if my math is correct, is a bit under 5.98 because of the "non-independence" of triplets. I computed it and found 5.382, so finding 6 is entirely normal. For the details: calling L = 600 and n = 3 * number of possible sequence of L digits: 10^L * if a specific digit appears in a specific position n times, the rest of the sequence has 9*10^(L-n-1) possibilities [9 because after the repetition you have a different digit] * we need to multiply that number by 10 as we have 10 digits * we need to multiply it again by L-n+1 as the digit can start in any place until the L-n+1 place * so the total number of sequence of L digits with at least 1 repetition of length n is: 9*10^(L-n-1) * 10 * (L-n+1) Dividing the 2 to compute the probability: P = 9*10^(L-n-1) * 10 * (L-n+1) / 10^L = 9*10^(L-n) * (L-n+1) / 10^L $ python3 -c "L = 600; n = 3; print(9*10**(L-n) * (L-n+1) / 10**L)" 5.382 Regards, Alex Le mar. 6 déc. 2022 à 07:34, Greg Ewing <gcewing@snap.net.nz> a écrit :

I used a brute force method to check the probability. Counted the number of triples in 600 random numbers 0-9, repeated that 10000 times and took the mean: 5.99 So it looks like Chris's number is more accurate. Benedict Op 6/12/2022 om 09:25 schreef Alex Prengère:

The difference is probably because I count 4 repeated values as 2 triples, etc..., while the 5.382 value counts them only once? Benedict -------- Original Message -------- From: Benedict Verhegghe Sent: Tuesday, December 6, 2022 at 10:38 UTC To: python-ideas@python.org Subject: [Python-ideas] Re: Better (?) PRNG - follow up

Yes, I ran a simulation where 4 repeated values are not counted as 2, and found something close to 5.38. There is also a slight inaccuracy in my formula when counting repeated values that are (or not) at the beginning/end of the sequence. Le mar. 6 déc. 2022 à 11:54, Benedict Verhegghe <bverheg@gmail.com> a écrit :

On Tue, 6 Dec 2022 at 21:39, Benedict Verhegghe <bverheg@gmail.com> wrote:
Calling my number "more accurate" is missing the point that all the figures given are in basically the same ballpark. :) We're not trying to see whether accounting data is accurate - we're trying to see if random numbers are plausible. So there's definitely going to be SOME variance, and all we're trying to do is get a broad idea of what "plausible" means here. But I do very much appreciate you throwing more data at this. The thing to be careful of, though, is that you might well be using the exact same RNG that the OP is wanting to test, which would end up merely proving that the OP's data is typical for this RNG. But if you used SystemRandom, then it is definitely of value. ChrisA

Thanks for posting your code, but at 178 lines (most of which are either commented out or irrelevent to your question) its a hard slog to work out what you're doing. And as for the seemingly endless sequence of "Random number ... Value entered", what did information did you think we would get from that? Did you think we would study each and every pair of lines, looking for a pattern? When asking questions about code, it helps to [post a **minimal** example](https://stackoverflow.com/help/minimal-reproducible-example) that we [can easily run](http://sscce.org/). In this case, we can compare the Mersenne Twister PRNG with the operating system's cryptographically strong RNG and see if we get similar results. The beauty of a PRNG like the Mersenne Twister is that it is perfectly repeatable if you know the seed. So if you repeat this *exact* code, you should get the *exact* same output. (Well, almost. Technically only the output of `random.random` is guaranteed not to change from one version to another. But in practice `random.randint` is also very stable.) ```python import random from statistics import mean, stdev random.seed(299) d1 = [random.randint(1, 9) for i in range(100000)] triples = 0 # Count the number of triples. for i in range(len(d1)-3): a, b, c = d1[i:i+3] if a == b == c: triples += 1 print("mean =", mean(d1)) print("stdev =", stdev(d1)) print("runs of three:", triples) ``` If you run that code, the output you get in Python 3.10 should be: ```text mean = 4.99025 stdev = 2.586159666323446 runs of three: 1244 ``` So out of 100000-3 = 99997 groups of three random integers, 1244 or 1.2% are a triplet (a run of three). Now let's compare the same with the OS's cryptographically strong RNG. Because this uses environmental randomness, there's no seed, so we can't replicate the results, but you should usually get something similar. ```python import random from statistics import mean, stdev R2 = random.SystemRandom() d2 = [random.randint(1, 9) for i in range(100000)] triples = 0 # Count the number of triples. for i in range(len(d2)-3): a, b, c = d2[i:i+3] if a == b == c: triples += 1 print("mean =", mean(d2)) print("stdev =", stdev(d2)) print("runs of three:", triples) ``` And output should usually be something like this: ```text mean = 4.99198 stdev = 2.5882045258492763 runs of three: 1228 ``` Virtually no difference. Of course there is a *tiny* chance that you could get something outlandishly unlikely, like a run of fifty thousand 6s in a row. If such a fluke was impossible, it wouldn't be *random*. You can see that there is no significant difference between the Mersenne Twister and the much more expensive crypto-strength SystemRandom. The MT is a **very** vigorously studied PRNG. It has **excellent** statistical properties, without being too expensive to generate random numbers. And it allows replicable results: if you know the seed, you can repeat the output. The only real disadvantage of MT is that it is not secure for cryptographic purposes. (By the way, using `SystemRandom`, I also got 134 runs of four and 14 runs of five.)

On Tue, 6 Dec 2022 at 14:00, James Johnson <jj126979@gmail.com> wrote:
Let's see. What are the odds of getting three consecutive digits? First digit could be anything, so that's 1/1; second digit is 1/10, third digit is 1/10. So for any given set of three random decimal digits, there's a 1% chance that they're all the same. In 600 rolls, there are 598 triples (since you could start a triple at any point), but they're not entirely independent. I'd have to do some serious number crunching to determine the exact probabilities involved, but the expected number of triples has to be somewhere between 1% of 200 (counting only those that completely don't overlap) and 1% of 598 (counting everything, regardless of overlap), or somewhere between 2 and 5.98. You got four of them, so without a ton of number crunching, I'd say that that's at least reasonably plausible. Or if I've misread you and you got six of them, that's still right on the edge of the expected range, and the chances of that happening are still quite reasonable. Calculating the probability of seeing six triples is actually quite complicated and I really don't want to go into the details, but I would recommend just trying on a larger pool of numbers. ChrisA

On 6/12/22 3:58 pm, James Johnson wrote:
I came back to this thread looking for the list of randomness tests, and I keep missing them somehow.
If you're interested in testing a PRNG really thoroughly, check out TestU01: http://simul.iro.umontreal.ca/testu01/tu01.html -- Greg

@Chris Indeed the true figure, if my math is correct, is a bit under 5.98 because of the "non-independence" of triplets. I computed it and found 5.382, so finding 6 is entirely normal. For the details: calling L = 600 and n = 3 * number of possible sequence of L digits: 10^L * if a specific digit appears in a specific position n times, the rest of the sequence has 9*10^(L-n-1) possibilities [9 because after the repetition you have a different digit] * we need to multiply that number by 10 as we have 10 digits * we need to multiply it again by L-n+1 as the digit can start in any place until the L-n+1 place * so the total number of sequence of L digits with at least 1 repetition of length n is: 9*10^(L-n-1) * 10 * (L-n+1) Dividing the 2 to compute the probability: P = 9*10^(L-n-1) * 10 * (L-n+1) / 10^L = 9*10^(L-n) * (L-n+1) / 10^L $ python3 -c "L = 600; n = 3; print(9*10**(L-n) * (L-n+1) / 10**L)" 5.382 Regards, Alex Le mar. 6 déc. 2022 à 07:34, Greg Ewing <gcewing@snap.net.nz> a écrit :

I used a brute force method to check the probability. Counted the number of triples in 600 random numbers 0-9, repeated that 10000 times and took the mean: 5.99 So it looks like Chris's number is more accurate. Benedict Op 6/12/2022 om 09:25 schreef Alex Prengère:

The difference is probably because I count 4 repeated values as 2 triples, etc..., while the 5.382 value counts them only once? Benedict -------- Original Message -------- From: Benedict Verhegghe Sent: Tuesday, December 6, 2022 at 10:38 UTC To: python-ideas@python.org Subject: [Python-ideas] Re: Better (?) PRNG - follow up

Yes, I ran a simulation where 4 repeated values are not counted as 2, and found something close to 5.38. There is also a slight inaccuracy in my formula when counting repeated values that are (or not) at the beginning/end of the sequence. Le mar. 6 déc. 2022 à 11:54, Benedict Verhegghe <bverheg@gmail.com> a écrit :

On Tue, 6 Dec 2022 at 21:39, Benedict Verhegghe <bverheg@gmail.com> wrote:
Calling my number "more accurate" is missing the point that all the figures given are in basically the same ballpark. :) We're not trying to see whether accounting data is accurate - we're trying to see if random numbers are plausible. So there's definitely going to be SOME variance, and all we're trying to do is get a broad idea of what "plausible" means here. But I do very much appreciate you throwing more data at this. The thing to be careful of, though, is that you might well be using the exact same RNG that the OP is wanting to test, which would end up merely proving that the OP's data is typical for this RNG. But if you used SystemRandom, then it is definitely of value. ChrisA

Thanks for posting your code, but at 178 lines (most of which are either commented out or irrelevent to your question) its a hard slog to work out what you're doing. And as for the seemingly endless sequence of "Random number ... Value entered", what did information did you think we would get from that? Did you think we would study each and every pair of lines, looking for a pattern? When asking questions about code, it helps to [post a **minimal** example](https://stackoverflow.com/help/minimal-reproducible-example) that we [can easily run](http://sscce.org/). In this case, we can compare the Mersenne Twister PRNG with the operating system's cryptographically strong RNG and see if we get similar results. The beauty of a PRNG like the Mersenne Twister is that it is perfectly repeatable if you know the seed. So if you repeat this *exact* code, you should get the *exact* same output. (Well, almost. Technically only the output of `random.random` is guaranteed not to change from one version to another. But in practice `random.randint` is also very stable.) ```python import random from statistics import mean, stdev random.seed(299) d1 = [random.randint(1, 9) for i in range(100000)] triples = 0 # Count the number of triples. for i in range(len(d1)-3): a, b, c = d1[i:i+3] if a == b == c: triples += 1 print("mean =", mean(d1)) print("stdev =", stdev(d1)) print("runs of three:", triples) ``` If you run that code, the output you get in Python 3.10 should be: ```text mean = 4.99025 stdev = 2.586159666323446 runs of three: 1244 ``` So out of 100000-3 = 99997 groups of three random integers, 1244 or 1.2% are a triplet (a run of three). Now let's compare the same with the OS's cryptographically strong RNG. Because this uses environmental randomness, there's no seed, so we can't replicate the results, but you should usually get something similar. ```python import random from statistics import mean, stdev R2 = random.SystemRandom() d2 = [random.randint(1, 9) for i in range(100000)] triples = 0 # Count the number of triples. for i in range(len(d2)-3): a, b, c = d2[i:i+3] if a == b == c: triples += 1 print("mean =", mean(d2)) print("stdev =", stdev(d2)) print("runs of three:", triples) ``` And output should usually be something like this: ```text mean = 4.99198 stdev = 2.5882045258492763 runs of three: 1228 ``` Virtually no difference. Of course there is a *tiny* chance that you could get something outlandishly unlikely, like a run of fifty thousand 6s in a row. If such a fluke was impossible, it wouldn't be *random*. You can see that there is no significant difference between the Mersenne Twister and the much more expensive crypto-strength SystemRandom. The MT is a **very** vigorously studied PRNG. It has **excellent** statistical properties, without being too expensive to generate random numbers. And it allows replicable results: if you know the seed, you can repeat the output. The only real disadvantage of MT is that it is not secure for cryptographic purposes. (By the way, using `SystemRandom`, I also got 134 runs of four and 14 runs of five.)
participants (6)
-
Alex Prengère
-
Benedict Verhegghe
-
Chris Angelico
-
Greg Ewing
-
James Johnson
-
Steven D'Aprano