simple text filter

John Machin sjmachin at lexicon.net
Thu Jun 12 09:28:33 EDT 2003


boutrosp at hotmail.com wrote in message news:<f903b9dd.0306111236.1ca87c93 at posting.google.com>...
> I need some help on a simple text filter. The problem I am having is
> when the file comes to the end it stays in the while loop and does not
> exit. I cannot figure this out. I would use a for loop with the
> readlines() but my datasets can range from 5 to 80 MB of text data.
> Here is the code I am using. Please help.
> 
> import sys, re
> p1 = re.compile('ADT100')
[snip]
> p8 = re.compile('ATAP')
> f=open('adt100_0489.rpt.txt', 'r')
> junky = 1
> done = False
> while not done :
>         junky = f.readline()
>         if p1.search(junky) :
>                 continue
[snip]
>         elif p8.search(junky) :
>                 continue
>         elif junky == None :
>                 done = True
>         else :
>                 print junky
> f.close()

Try this:

import sys, re
good_stuff = re.compile(
   'ADT100|AUDIT|HARDWARE|PACKAGES|NODE|DROP|GRID|ATAP' 
   # list these in descending frequency order
   ).search
for aline in file(sys.argv[1]):
   # hardcoded file names not a good idea
   if not good_stuff(aline):
   	print aline

You may want to ensure that you don't match e.g DROPKICK when you only
want DROP. E.g. r'\b(ADT100|AUDIT|HARDWARE|PACKAGES|NODE|DROP|GRID|ATAP)\b'
Note carefully the r prefix (raw string).




More information about the Python-list mailing list