[Tutor] A file containing a string of 1 billion random digits.
Richard D. Moores
rdmoores at gmail.com
Sun Jul 18 12:30:05 CEST 2010
On Sun, Jul 18, 2010 at 02:26, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sun, 18 Jul 2010 06:49:39 pm Richard D. Moores wrote:
>
>> I might try
>> trigraphs where the 2nd digit is 2 more than the first, and the third
>> 2 more than the 2nd. E.g. '024', '135', '791', '802'.
>
> Why the restriction? There's only 1000 different trigraphs (10*10*10),
> which is nothing.
Just to see if I could do it. It seemed interesting.
>> Or maybe I've
>> had enough. BTW Steve, my script avoids the problem you mentioned, of
>> counting 2 '55's in a '555' string. I get only one, but 2 in '5555'.
>
> Huh? What problem did I mention?
Sorry, that was Luke.
> Taking the string '555', you should get two digraphs: 55_ and _55.
That seems wrong to me. When I search on '999999' and there's a
'9999999' I don't want to think I've found 2 instances of '999999'.
But that's just my preference. Instances should be distinct, IMO, and
not overlap.
> In '5555' you should get three: 55__, _55_, __55. I'd do something like
> this (untested):
>
> trigraphs = {}
> f = open('digits')
> trigraph = f.read(3) # read the first three digits
> trigraphs[trigraph] = 1
> while 1:
> c = f.read(1)
> if not c:
> break
> trigraph = trigraph[1:] + c
> if trigraph in trigraphs:
> trigraphs[trigraph] += 1
> else:
> trigraphs[trigraph] = 1
>> See line 18, in the while loop.
>>
>> I was surprised that I could read in the whole billion file with one
>> gulp without running out of memory.
>
> Why? One billion bytes is less than a GB. It's a lot, but not *that*
> much.
I earlier reported that my laptop couldn't handle even 800 million.
>> Memory usage went to 80% (from
>> the usual 35%), but no higher except at first, when I saw 98% for a
>> few seconds, and then a drop to 78-80% where it stayed.
>
> That suggests to me that your PC probably has 2GB of RAM. Am I close?
No. 4GB.
More information about the Tutor
mailing list