[Tutor] A file containing a string of 1 billion random digits.
Steven D'Aprano
steve at pearwood.info
Sun Jul 18 03:01:32 CEST 2010
Do you care about speed? If this is a script that just needs to run
once, it seems to me that the simplest, easiest to read solution is:
import random
def random_digit():
return "0123456789"[random.randrange(10)]
f = open('rand_digits.txt', 'w')
for i in xrange(10**9):
f.write(random_digit())
f.close()
This is, of course, horribly inefficient -- it generates digits one at a
time, and worse, it writes them one at a time. I got bored waiting for
it to finish after 20 minutes (at which time it was about 10% of the
way through), but you could let it run in the background for as long as
it takes.
If speed does matter, the first improvement is to generate larger
streams of random digits at once. An even bigger improvement is to cut
down on the number of disk-writes -- hard drives are a thousand times
slower than RAM, so the more often you write to the disk, the worse off
you are.
import random
def random_digits(n):
"Return n random digits with one call to random."
return "%0*d" % (n, random.randrange(10**n))
f = open('rand_digits.txt', 'w')
for i in xrange(1000):
buffer = [random_digits(10) for j in xrange(100000)]
f.write(''.join(buffer))
f.close()
On my not-even-close-to-high-end PC, this generates one billion digits
in 22 minutes:
[steve at sylar python]$ time python randdigits.py
real 22m31.205s
user 20m18.546s
sys 0m7.675s
[steve at sylar python]$ ls -l rand_digits.txt
-rw-rw-r-- 1 steve steve 1000000000 2010-07-18 11:00 rand_digits.txt
Having generated the digits, it might be useful to look for deviations
from randomness. There should be approximately equal numbers of each
digit (100,000,000 each of 0, 1, 2, ..., 9), of each digraph
(10,000,000 each of 00, 01, 02, ..., 98, 99), trigraphs (1,000,000 each
of 000, ..., 999) and so forth.
The interesting question is, if you measure a deviation from the
equality (and you will), is it statistically significant? If so, it is
because of a problem with the random number generator, or with my
algorithm for generating the sample digits?
--
Steven D'Aprano
More information about the Tutor
mailing list