Akward code using multiple regexp searches
Topher Cawlfield
cawlfiel at uiuc.edu
Fri Sep 10 00:35:14 EDT 2004
Hi,
I'm relatively new to Python, and I already love it even after several
years of writing Perl. But a few times already I've found myself
writing the following bit of awkward code when parsing text files. Can
anyone suggest a more elegant solution?
rexp1 = re.compile(r'blah(dee)blah')
rexp2 = re.compile(r'hum(dum)')
for line in inFile:
reslt = rexp1.search(line)
if reslt:
something = reslt.group(1)
else:
reslt = rexp2.search(line)
if reslt:
somethingElse = reslt.group(1)
I'm getting more and more nested if statements, which gets ugly and very
hard to follow after the fourth or fifth regexp search.
Equivalent Perl code is more compact but more importantly seems to
communicate the process of searching for multiple regular expressions
more clearly:
while (<IN>) {
if (/blah(dee)blah/) {
$something = $1;
} elsif (/hum(dum)/) {
$somethingElse = $1;
}
}
I'm a little bit worried about doing the following in Python, since I'm
not sure if the compiler is smart enough to avoid doing each regexp
search twice:
for line in inFile:
if rexp1.search(line)
something = rexp1.search(line).group(1)
elif rexp2.search(line):
somethingElse = rexp2.search(line).group(1)
In many cases I am worried about efficiency as these scripts parse a
couple GB of text!
Does anyone have a suggestion for cleaning up this commonplace Python
code construction?
Thanks,
Topher Cawlfield
More information about the Python-list
mailing list