Best way to extract from regex in if statement

Paul McGuire ptmcg at austin.rr.com
Sat Apr 4 10:26:05 EDT 2009


On Apr 3, 9:26 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> bwgoudey <bwgou... at gmail.com> writes:
> > elif re.match("^DATASET:\s*(.+) ", line):
> >         m=re.match("^DATASET:\s*(.+) ", line)
> >         print m.group(1))
>
> Sometimes I like to make a special class that saves the result:
>
>   class Reg(object):   # illustrative code, not tested
>      def match(self, pattern, line):
>         self.result = re.match(pattern, line)
>         return self.result
>
I took this a little further, *and* lightly tested it too.

Since this idiom makes repeated references to the input line, I added
that to the constructor of the matching class.

By using __call__, I made the created object callable, taking the RE
expression as its lone argument and returning a boolean indicating
match success or failure.  The result of the re.match call is saved in
self.matchresult.

By using __getattr__, the created object proxies for the results of
the re.match call.

I think the resulting code looks pretty close to the original C or
Perl idiom of cascading "elif (c=re_expr_match("..."))" blocks.

(I thought about cacheing previously seen REs, or adding support for
compiled REs instead of just strings - after all, this idiom usually
occurs in a loop while iterating of some large body of text.  It turns
out that the re module already caches previously compiled REs, so I
left my cacheing out in favor of that already being done in the std
lib.)

-- Paul

import re

class REmatcher(object):
    def __init__(self,sourceline):
        self.line = sourceline
    def __call__(self, regexp):
        self.matchresult = re.match(regexp, self.line)
        self.success = self.matchresult is not None
        return self.success
    def __getattr__(self, attr):
        return getattr(self.matchresult, attr)


This test:

test = """\
ABC
123
xyzzy
Holy Hand Grenade
Take the pebble from my hand, Grasshopper
"""

outfmt = "'%s' is %s [%s]"
for line in test.splitlines():
    matchexpr = REmatcher(line)
    if matchexpr(r"\d+$"):
        print outfmt % (line, "numeric", matchexpr.group())
    elif matchexpr(r"[a-z]+$"):
        print outfmt % (line, "lowercase", matchexpr.group())
    elif matchexpr(r"[A-Z]+$"):
        print outfmt % (line, "uppercase", matchexpr.group())
    elif matchexpr(r"([A-Z][a-z]*)(\s[A-Z][a-z]*)*$"):
        print outfmt % (line, "a proper word or phrase",
matchexpr.group())
    else:
        print outfmt % (line, "something completely different", "...")

Produces:
'ABC' is uppercase [ABC]
'123' is numeric [123]
'xyzzy' is lowercase [xyzzy]
'Holy Hand Grenade' is a proper word or phrase [Holy Hand Grenade]
'Take the pebble from my hand, Grasshopper' is something completely
different [...]



More information about the Python-list mailing list