[Tutor] A file containing a string of 1 billion random digits.

Sun Jul 18 11:45:33 CEST 2010

Richard D. Moores wrote:
> On Sat, Jul 17, 2010 at 18:01, Steven D'Aprano <steve at pearwood.info> wrote:
>   
>> <snip>
>>
>> import random
>> def random_digits(n):
>>    "Return n random digits with one call to random."
>>    return "%0*d" % (n, random.randrange(10**n))
>>
>>     
Thanks for implementing what I was suggesting, using zero-fill for 
getting a constant width string from a number.  No need for extra 
digits, or funny ranges
>> <snip>
> My <http://tutoree7.pastebin.com/9BMYZ08z> took 218 secs.
>
>   
>> Having generated the digits, it might be useful to look for deviations
>> from randomness. There should be approximately equal numbers of each
>> digit (100,000,000 each of 0, 1, 2, ..., 9), of each digraph
>> (10,000,000 each of 00, 01, 02, ..., 98, 99), trigraphs (1,000,000 each
>> of 000, ..., 999) and so forth.
>>     
>
> Yes. I'll do some checking. Thanks for the tips.
>   
>> The interesting question is, if you measure a deviation from the
>> equality (and you will), is it statistically significant? If so, it is
>> because of a problem with the random number generator, or with my
>> algorithm for generating the sample digits?
>>     
>
> Ah. Can't wait to see what turns up.
>
> Thanks, Steve.
>
> Dick
>
>   
If you care about the randomness, it's important to measure deviations 
from equal, and to make sure not only that they don't vary too much, but 
also that they don't vary too little.  If you measured exactly 100 
million 5's, you're very unlikely to have a real random string.

There are a series of tests you could perform, but I no longer have any 
references to what ones would be useful.  Years ago I inherited a random 
number generator in which the individual values seemed to be quite 
random, but adjacent pairs had some very definite patterns.  I ended up 
writing a new generator, from scratch, which was both much faster and 
much more random.  But I didn't do the testing myself.

DaveA