[Tutor] Scanning a file for specific text and copying it to a newfile

Alan Gauld alan.gauld at btinternet.com
Fri Dec 3 01:14:46 CET 2010


"Ben Ganzfried" <ben.ganzfried at gmail.com> wrote

> I'm trying to build a program that reads in a file and copies 
> specific
> sections to a new file.  More specifically, every time the words
> "summary on" are in the original file, I want to copy the following
> text to the new file until I get to the words "summary off".

This is a very common data processing technique and is often
termed setting a "sentinel" value or flag. The generic logic looks 
like:

for line in file:
     if sentinalFlag is True:
            if line is sentinel off
              sentinelFlag = False
            else
               process line
     else if sentinel is in line
            sentinelFlag = True

> 1) Once I have read in the old file, how do I copy just the parts 
> that
> I want to the new file?

See above where "process line" is wrting the line to the output file

> all of this in a while loop based on the condition that we should 
> read
> until we are done reading the whole document?

Since you don't know where the last sentinel off line is you must
process the whole file so a for loop is probably more appropriate
than a while loop.

If the data is short you can read the whole string into memory
and tokenize it instead (ie search for the first occurence of on
then find the next occurence of off and copy the slice between
start and end positions). But if you ever need to process large
files then the line by line sentinel approach is more effective
and uses a lot less memory.

HTH,


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/





More information about the Tutor mailing list