[Tutor] A file containing a string of 1 billion random digits.

Sun Jul 18 11:26:07 CEST 2010

On Sun, 18 Jul 2010 06:49:39 pm Richard D. Moores wrote:

> I might try 
> trigraphs where the 2nd digit is 2 more than the first, and the third
> 2 more than the 2nd. E.g. '024', '135', '791', '802'. 

Why the restriction? There's only 1000 different trigraphs (10*10*10), 
which is nothing.

> Or maybe I've 
> had enough. BTW Steve, my script avoids the problem you mentioned, of
> counting 2 '55's in a '555' string. I get only one, but 2 in '5555'.

Huh? What problem did I mention? 

Taking the string '555', you should get two digraphs: 55_ and _55. 
In '5555' you should get three: 55__, _55_, __55. I'd do something like 
this (untested):

trigraphs = {}
f = open('digits')
trigraph = f.read(3)  # read the first three digits
trigraphs[trigraph] = 1
while 1:
    c = f.read(1)
    if not c:
        break
    trigraph = trigraph[1:] + c
    if trigraph in trigraphs:
        trigraphs[trigraph] += 1
    else:
        trigraphs[trigraph] = 1

> See line 18, in the while loop.
>
> I was surprised that I could read in the whole billion file with one
> gulp without running out of memory.

Why? One billion bytes is less than a GB. It's a lot, but not *that* 
much.

> Memory usage went to 80% (from 
> the usual 35%), but no higher except at first, when I saw 98% for a
> few seconds, and then a drop to 78-80% where it stayed.

That suggests to me that your PC probably has 2GB of RAM. Am I close?

-- 
Steven D'Aprano