Akward code using multiple regexp searches
Jason Lai
jmlai at uci.edu
Fri Sep 10 03:03:44 EDT 2004
Topher Cawlfield wrote:
> Hi,
>
> I'm relatively new to Python, and I already love it even after several
> years of writing Perl. But a few times already I've found myself
> writing the following bit of awkward code when parsing text files. Can
> anyone suggest a more elegant solution?
>
> rexp1 = re.compile(r'blah(dee)blah')
> rexp2 = re.compile(r'hum(dum)')
> for line in inFile:
> reslt = rexp1.search(line)
> if reslt:
> something = reslt.group(1)
> else:
> reslt = rexp2.search(line)
> if reslt:
> somethingElse = reslt.group(1)
>
> I'm getting more and more nested if statements, which gets ugly and very
> hard to follow after the fourth or fifth regexp search.
>
> Equivalent Perl code is more compact but more importantly seems to
> communicate the process of searching for multiple regular expressions
> more clearly:
>
> while (<IN>) {
> if (/blah(dee)blah/) {
> $something = $1;
> } elsif (/hum(dum)/) {
> $somethingElse = $1;
> }
> }
>
> I'm a little bit worried about doing the following in Python, since I'm
> not sure if the compiler is smart enough to avoid doing each regexp
> search twice:
>
> for line in inFile:
> if rexp1.search(line)
> something = rexp1.search(line).group(1)
> elif rexp2.search(line):
> somethingElse = rexp2.search(line).group(1)
>
> In many cases I am worried about efficiency as these scripts parse a
> couple GB of text!
>
> Does anyone have a suggestion for cleaning up this commonplace Python
> code construction?
>
> Thanks,
> Topher Cawlfield
Does it have to be stored in a different variable? If you have a list of
regexs and you want to see if any of them match, you could create a
compound regex such as "blah(dee)blah|hum(dum)" and search for that
(although you have to be careful about overlaps.)
- Jason Lai
More information about the Python-list
mailing list