Akward code using multiple regexp searches

Topher Cawlfield cawlfiel at uiuc.edu
Fri Sep 10 06:35:14 CEST 2004


I'm relatively new to Python, and I already love it even after several 
years of writing Perl.  But a few times already I've found myself 
writing the following bit of awkward code when parsing text files.  Can 
anyone suggest a more elegant solution?

rexp1 = re.compile(r'blah(dee)blah')
rexp2 = re.compile(r'hum(dum)')
for line in inFile:
     reslt = rexp1.search(line)
     if reslt:
         something = reslt.group(1)
         reslt = rexp2.search(line)
         if reslt:
             somethingElse = reslt.group(1)

I'm getting more and more nested if statements, which gets ugly and very 
hard to follow after the fourth or fifth regexp search.

Equivalent Perl code is more compact but more importantly seems to 
communicate the process of searching for multiple regular expressions 
more clearly:

while (<IN>) {
     if (/blah(dee)blah/) {
         $something = $1;
     } elsif (/hum(dum)/) {
         $somethingElse = $1;

I'm a little bit worried about doing the following in Python, since I'm 
not sure if the compiler is smart enough to avoid doing each regexp 
search twice:

for line in inFile:
     if rexp1.search(line)
         something = rexp1.search(line).group(1)
     elif rexp2.search(line):
         somethingElse = rexp2.search(line).group(1)

In many cases I am worried about efficiency as these scripts parse a 
couple GB of text!

Does anyone have a suggestion for cleaning up this commonplace Python 
code construction?

     Topher Cawlfield

More information about the Python-list mailing list