[Tutor] A file containing a string of 1 billion random digits.

Richard D. Moores rdmoores at gmail.com
Sun Jul 18 03:51:17 CEST 2010


On Sat, Jul 17, 2010 at 18:01, Steven D'Aprano <steve at pearwood.info> wrote:
> Do you care about speed? If this is a script that just needs to run
> once, it seems to me that the simplest, easiest to read solution is:
>
> import random
> def random_digit():
>    return "0123456789"[random.randrange(10)]
>
> f = open('rand_digits.txt', 'w')
> for i in xrange(10**9):
>    f.write(random_digit())
>
> f.close()
>
>
> This is, of course, horribly inefficient -- it generates digits one at a
> time, and worse, it writes them one at a time. I got bored waiting for
> it to finish after 20 minutes (at which time it was about 10% of the
> way through), but you could let it run in the background for as long as
> it takes.
>
> If speed does matter, the first improvement is to generate larger
> streams of random digits at once. An even bigger improvement is to cut
> down on the number of disk-writes -- hard drives are a thousand times
> slower than RAM, so the more often you write to the disk, the worse off
> you are.
>
>
> import random
> def random_digits(n):
>    "Return n random digits with one call to random."
>    return "%0*d" % (n, random.randrange(10**n))
>
> f = open('rand_digits.txt', 'w')
> for i in xrange(1000):
>    buffer = [random_digits(10) for j in xrange(100000)]
>    f.write(''.join(buffer))
>
> f.close()
>
> On my not-even-close-to-high-end PC, this generates one billion digits
> in 22 minutes:
>
> [steve at sylar python]$ time python randdigits.py
>
> real    22m31.205s
> user    20m18.546s
> sys     0m7.675s
> [steve at sylar python]$ ls -l rand_digits.txt
> -rw-rw-r-- 1 steve steve 1000000000 2010-07-18 11:00 rand_digits.txt

My <http://tutoree7.pastebin.com/9BMYZ08z> took 218 secs.

> Having generated the digits, it might be useful to look for deviations
> from randomness. There should be approximately equal numbers of each
> digit (100,000,000 each of 0, 1, 2, ..., 9), of each digraph
> (10,000,000 each of 00, 01, 02, ..., 98, 99), trigraphs (1,000,000 each
> of 000, ..., 999) and so forth.

Yes. I'll do some checking. Thanks for the tips.
>
> The interesting question is, if you measure a deviation from the
> equality (and you will), is it statistically significant? If so, it is
> because of a problem with the random number generator, or with my
> algorithm for generating the sample digits?

Ah. Can't wait to see what turns up.

Thanks, Steve.

Dick


More information about the Tutor mailing list