[Tutor] Scanning a file for specific text and copying it to a new file

Steven D'Aprano steve at pearwood.info
Fri Dec 3 01:03:12 CET 2010


Ben Ganzfried wrote:
> I'm trying to build a program that reads in a file and copies specific
> sections to a new file.  More specifically, every time the words
> "summary on" are in the original file, I want to copy the following
> text to the new file until I get to the words "summary off".
> 
> My questions are the following:
> 1) Once I have read in the old file, how do I copy just the parts that
> I want to the new file?  What command do I give the computer to know
> that "summary on" (whether capitalized or not) means start writing to
> the new file-- and "summary off" means stop?  

There is no such command -- you have to program it yourself. I'll give 
you an outline:

def copy(inname, outname):
     infile = open(inname, 'r')
     outfile = open(outname, 'w')
     copying = False  # Don't copy anything yet.
     for line in infile:
         if copying:
             outfile.write(line)
     infile.close()
     outfile.close()

Note:

(1) This is not how you would implement a "real" file utility. There are 
much faster methods if all you want is to copy the contents in full. But 
since the next step will be to *process* the data in the file, we need 
to do it the slow(ish) way.

(2) This has a lack of error checking. What happens if you pass the same 
file name for both arguments? What if the input file doesn't exist, or 
the output file does?

(3) This assumes that the file is a text file, and won't work on 
arbitrary binary files.


The next step is to look for the tag. You don't say whether "summary on" 
has to be in a line on its own, or if it can be in the middle of a line, 
or even whether it could be split over two lines. That third case will 
be tricky, the second case is probably a bad idea, so I'll just assume 
"summary on" must be in a line on its own:


     for line in infile:
         # Ignore leading & trailing whitespace, don't care about case.
         s = line.strip().lower()
         if s == "summary on":
             copying = True  # Start copying.
             continue  # But not *this* line, start with the next.
         elif s == "summary off":
             copying = False  # Stop copying.
         if copying:
             outfile.write(line)


Note that this allows a single file to have more than one summary 
section. If this is not the case, you can replace the "Stop copying" 
line with the command `break` to exist the for-loop early and avoid 
processing the rest of the file.


-- 
Steven


More information about the Tutor mailing list