[Tutor] Opening Multiple Files

Paulo Quaglio paulo_quaglio at yahoo.com
Fri Aug 17 07:10:18 CEST 2007

Hi everyone,
  Thanks for all suggestions. Let me just preface this by saying that I’m new to both python and programming. I started learning 3 months ago with online tutorials and reading the questions you guys post. So, thank you all very, very much
and I apologize if I’m doing something really stupid..:-)                                                                                                                                                            OK. I’ve solved the problem of opening several files to process “as a batch” with glob.glob(). Only now did I realize that the program and files need to be in the same folder
. Now I have another problem. 
  1- I want to open several files and count the total number of words. If I do this with only 1 file, it works great. With several files ( now with glob), it outputs the total count for each file individually and not the whole corpus (see comment in the program below).
  2- I also want the program to output a word frequency list (we do this a lot in corpus linguistics). When I do this with only one file, the program works great (with a dictionary). With several files, I end up with several frequency lists, one for each file. This sounds like a loop type of problem, doesn’t it? I looked at the indentations too and  I can’t find what the problem is. Your comments, suggestions, etc are greatly appreciated. Thanks again for all your help. Paulo   
  Here goes what I have.
  # The program is intended to output a word frequency list (including all words in all files) and the total word count 
  def sortfile():  # I created a function
      filename = glob.glob('*.txt') # this works great! Thanks!
      for allfiles in filename:
          infile = open(allfiles, 'r')
          lines = list(infile)
          words = [] # initializes list of words
          wordcounter = 0
          for line in lines: 
              line = line.lower()  # after this, I have some clunky code to get rid of punctuation
              words = words + line.split() 
          wordfreq = [words.count(wrd)for wrd in words] # counts the freq of each word in a list
          dictionary = dict(zip(words, wordfreq))
          frequency_list = [(dictionary[key], key)for key in dictionary] 
          for item in frequency_list:
              wordcounter = wordcounter + 1
              print item
      print "Total # of words:", wordcounter #  this will give the word count of the last file the program reads. 
             print "Total # of words:", wordcounter        # if I put it here, I get the total count after each file                          
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/tutor/attachments/20070816/1c0241ea/attachment.htm 

More information about the Tutor mailing list