[Tutor] Writing to output file

Corran Webster cwebster@math.tamu.edu
Thu, 25 Mar 1999 16:55:04 -0600 (CST)

On 25 Mar, Jon Cosby wrote:
> This should be an easy one, but I'm not finding it: How do you print to an
> output file? I'm trying to get the results of search.py below on a text
> file. Nothing I've tried seems to work.

To write to a file, you want to use the write method of the file
object.  Looking at your code, you want something like:

    outfile.write("%s:%d" % (text, len(a)))


    outfile.write(text + ":" + `len(a)`)

or even

    outfile.write(text + ":" + str(len(a)))

or, as has been mentioned by others, you can redirect standard output
to your file.

> Maybe somebody can tell me why it's so slow, too; it took it 15 minutes to
> search a 115 MB archive.

I suspect a lot of the slowness comes from the fact that you are
compiling your regular expression every time you go through the loop. 
Also I am a little unclear about why you need to keep a copy of the
matches when all you seem to be interested in is the number of matches.
Both these could be factors, as could be the simple fact that 115 MB is
a lot of data, and Python is slower than C.

For other issues, I think you want to open your results file in the
getfile if you are going to be processing many different files - if you
open and close the file within the searchtext function, you will
overwrite the results file each time you search.  If you don't
need the full power of regular expressions you may be able to
get away with using string.count(). Finally, the os.path.walk function
will improve your getFile function - it does much the same thing, but is
more robust and will work on any platform without modification.

(Warning! The following code is untested - I only have 1.5.1 here,
findall requires 1.5.2.)

import os, re, sys

def searchfile(text, filename):
  n = 0
  pat = re.compile(text, re.I)
    infile = open(filename, "r")
  except IOError:
    # can't open the file or something similar
    print filename, ": Cannot open"
    line = infile.readline()
    while line:
      n = n + len(pat.findall(line))
      line = infile.readline()
    print filename, ":", n

def searchdir(text, dir, names):
  for name in names:
    fullname = os.path.join(dir, name)
    if os.path.isfile(fullname):
      searchfile(text, fullname)

def searchtree(text, dir):
  os.path.walk(dir, searchdir, text)

if __name__ == "__main__":
  outfile = open('c:\\data\\results.txt', 'w')
  oldstdout = sys.stdout
  sys.stdout = outfile
  searchtree(sys.argv[1], sys.argv[2])
  sys.stdout = oldstdout