[Tutor] A file containing a string of 1 billion random digits.

Luke Paireepinart rabidpoobear at gmail.com
Sun Jul 18 07:55:20 CEST 2010


On Jul 17, 2010, at 7:19 PM, "Richard D. Moores" <rdmoores at gmail.com> wrote:

> I don't fully understand your "you always have two [chunks] in memory
> at once". Does that take care of the overlap I think I need? I don't
> want you or anyone to give me the the whole answer, but could you
> clarify a bit about keeping 2 chunks in memory? Let's say the chunks,
> in order, and without overlap, are A, B, C, D, E, ...  So I'd have AB
> to search, then BC, then CD, then DE, ... ? But then what if I find an
> instance of, say, my father's birthdate in B when searching AB. Then
> I'd find the same instance again in BC. How do I prevent that from
> counting as 2 instances?

Imagine you have a series of Buildings in a block, and you are checking to see if a series of 5 windows in a row is on ( even between buildings). You start at one end of the block and you check the first five windows. If they are all on, you have found a match. If not, you move forward one window. Now repeat. After a few repetitions you will end up at the border between two buildings, and after a few more you will be looking exclusively at the 2nd building. Then after a few more you will be looking at the 3rd building and so on.

But are you ever looking at more than one building at a time? Yea but only adjacent buildings and only for 4 steps. Once, when you have 4 windows from the first, 1 from the second, again for 3 windows from the first, then 2 and then 1.

The way this translates into an algorithm is that instead of discarding your old set of data as soon as you need to load another set, you hold onto it long enough to get all the way engaged with that second set, and from there you can free up the previous block's memory and move forward. And since you are using a "sliding window" approach you shouldn't ever find one result twice. Although note that if you overlap your searches as I said earlier, you can run into some matching situations that are interesting. Say you search for 55 and your string is 1827355513. You'll find 55 twice even though there are only 3 fives in the search string. Just something to keep in mind.

Hope that helps!
-luke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/tutor/attachments/20100718/71bae76a/attachment-0001.html>


More information about the Tutor mailing list