How do I skip over multiple words in a file?
Tim Chase
python.list at tim.thechases.com
Thu Nov 11 10:48:54 EST 2010
On 11/11/10 09:07, chad wrote:
> Let's say that I have an article. What I want to do is read in
> this file and have the program skip over ever instance of the
> words "the", "and", "or", and "but". What would be the
> general strategy for attacking a problem like this?
I'd keep a file of "stop words", read them into a set
(normalizing case in the process). Then, as I skim over each
word in my target file, check if the case-normalized version of
the word is in your stop-words and skipping if it is. It might
look something like this:
def normalize_word(s):
return s.strip().upper()
stop_words = set(
normalize_word(word)
for word in file('stop_words.txt')
)
for line in file('data.txt'):
for word in line.split():
if normalize_word(word) in stop_words: continue
process(word)
-tkc
More information about the Python-list
mailing list