simple text filter
John Machin
sjmachin at lexicon.net
Thu Jun 12 09:28:33 EDT 2003
boutrosp at hotmail.com wrote in message news:<f903b9dd.0306111236.1ca87c93 at posting.google.com>...
> I need some help on a simple text filter. The problem I am having is
> when the file comes to the end it stays in the while loop and does not
> exit. I cannot figure this out. I would use a for loop with the
> readlines() but my datasets can range from 5 to 80 MB of text data.
> Here is the code I am using. Please help.
>
> import sys, re
> p1 = re.compile('ADT100')
[snip]
> p8 = re.compile('ATAP')
> f=open('adt100_0489.rpt.txt', 'r')
> junky = 1
> done = False
> while not done :
> junky = f.readline()
> if p1.search(junky) :
> continue
[snip]
> elif p8.search(junky) :
> continue
> elif junky == None :
> done = True
> else :
> print junky
> f.close()
Try this:
import sys, re
good_stuff = re.compile(
'ADT100|AUDIT|HARDWARE|PACKAGES|NODE|DROP|GRID|ATAP'
# list these in descending frequency order
).search
for aline in file(sys.argv[1]):
# hardcoded file names not a good idea
if not good_stuff(aline):
print aline
You may want to ensure that you don't match e.g DROPKICK when you only
want DROP. E.g. r'\b(ADT100|AUDIT|HARDWARE|PACKAGES|NODE|DROP|GRID|ATAP)\b'
Note carefully the r prefix (raw string).
More information about the Python-list
mailing list