How do I skip over multiple words in a file?

Stefan Sonnenberg-Carstens stefan.sonnenberg at pythonmeister.com
Thu Nov 11 16:18:14 EST 2010


Am 11.11.2010 21:33, schrieb Paul Watson:
> On 2010-11-11 08:07, chad wrote:
>> Let's say that I have an article. What I want to do is read in this
>> file and have the program skip over ever instance of the words "the",
>> "and",  "or", and "but". What would be the general strategy for
>> attacking a problem like this?
>
> I realize that you may need or want to do this in Python.  This would 
> be trivial in an awk script.
There are several ways to do this.

skip = ('and','or','but')
all=[]
[[all.append(w) for w in l.split() if w not in skip] for l in 
open('some.txt').readlines()]
print all

If some.txt contains your original question, it returns this:
["Let's", 'say', 'that', 'I', 'have', 'an', 'article.', 'What', 'I', 
'want', 'to
', 'do', 'is', 'read', 'in', 'this', 'file', 'have', 'the', 'program', 
'skip', '
over', 'ever', 'instance', 'of', 'the', 'words', '"the",', '"and",', 
'"or",', '"
but".', 'What', 'would', 'be', 'the', 'general', 'strategy', 'for', 
'attacking',
  'a', 'problem', 'like', 'this?']

But this _one_ way to get there.
Faster solutions could be based on a regex:
import re
skip = ('and','or','but')
all = re.compile('(\w+)')
print [w for w in all.findall(open('some.txt').read()) if w not in skip]

this gives this result (you loose some punctuation etc):
['Let', 's', 'say', 'that', 'I', 'have', 'an', 'article', 'What', 'I', 
'want', '
to', 'do', 'is', 'read', 'in', 'this', 'file', 'have', 'the', 'program', 
'skip',
  'over', 'ever', 'instance', 'of', 'the', 'words', 'the', 'What', 
'would', 'be',
  'the', 'general', 'strategy', 'for', 'attacking', 'a', 'problem', 
'like', 'this
']

But there are some many ways to do it ...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: stefan_sonnenberg.vcf
Type: text/x-vcard
Size: 223 bytes
Desc: not available
URL: <http://mail.python.org/pipermail/python-list/attachments/20101111/7e0003be/attachment.vcf>


More information about the Python-list mailing list