How do I skip over multiple words in a file?

Tim Chase python.list at tim.thechases.com
Thu Nov 11 10:48:54 EST 2010


On 11/11/10 09:07, chad wrote:
> Let's say that I have an article. What I want to do is read in
> this file and have the program skip over ever instance of the
> words "the", "and",  "or", and "but". What would be the
> general strategy for attacking a problem like this?

I'd keep a file of "stop words", read them into a set 
(normalizing case in the process).  Then, as I skim over each 
word in my target file, check if the case-normalized version of 
the word is in your stop-words and skipping if it is.  It might 
look something like this:

   def normalize_word(s):
     return s.strip().upper()

   stop_words = set(
     normalize_word(word)
     for word in file('stop_words.txt')
     )
   for line in file('data.txt'):
     for word in line.split():
       if normalize_word(word) in stop_words: continue
       process(word)

-tkc






More information about the Python-list mailing list