[Tutor] A file containing a string of 1 billion random digits.
Steven D'Aprano
steve at pearwood.info
Sun Jul 18 11:26:07 CEST 2010
On Sun, 18 Jul 2010 06:49:39 pm Richard D. Moores wrote:
> I might try
> trigraphs where the 2nd digit is 2 more than the first, and the third
> 2 more than the 2nd. E.g. '024', '135', '791', '802'.
Why the restriction? There's only 1000 different trigraphs (10*10*10),
which is nothing.
> Or maybe I've
> had enough. BTW Steve, my script avoids the problem you mentioned, of
> counting 2 '55's in a '555' string. I get only one, but 2 in '5555'.
Huh? What problem did I mention?
Taking the string '555', you should get two digraphs: 55_ and _55.
In '5555' you should get three: 55__, _55_, __55. I'd do something like
this (untested):
trigraphs = {}
f = open('digits')
trigraph = f.read(3) # read the first three digits
trigraphs[trigraph] = 1
while 1:
c = f.read(1)
if not c:
break
trigraph = trigraph[1:] + c
if trigraph in trigraphs:
trigraphs[trigraph] += 1
else:
trigraphs[trigraph] = 1
> See line 18, in the while loop.
>
> I was surprised that I could read in the whole billion file with one
> gulp without running out of memory.
Why? One billion bytes is less than a GB. It's a lot, but not *that*
much.
> Memory usage went to 80% (from
> the usual 35%), but no higher except at first, when I saw 98% for a
> few seconds, and then a drop to 78-80% where it stayed.
That suggests to me that your PC probably has 2GB of RAM. Am I close?
--
Steven D'Aprano
More information about the Tutor
mailing list