Best way to extract from regex in if statement
Paul McGuire
ptmcg at austin.rr.com
Sat Apr 4 10:26:05 EDT 2009
On Apr 3, 9:26 pm, Paul Rubin <http://phr...@NOSPAM.invalid> wrote:
> bwgoudey <bwgou... at gmail.com> writes:
> > elif re.match("^DATASET:\s*(.+) ", line):
> > m=re.match("^DATASET:\s*(.+) ", line)
> > print m.group(1))
>
> Sometimes I like to make a special class that saves the result:
>
> class Reg(object): # illustrative code, not tested
> def match(self, pattern, line):
> self.result = re.match(pattern, line)
> return self.result
>
I took this a little further, *and* lightly tested it too.
Since this idiom makes repeated references to the input line, I added
that to the constructor of the matching class.
By using __call__, I made the created object callable, taking the RE
expression as its lone argument and returning a boolean indicating
match success or failure. The result of the re.match call is saved in
self.matchresult.
By using __getattr__, the created object proxies for the results of
the re.match call.
I think the resulting code looks pretty close to the original C or
Perl idiom of cascading "elif (c=re_expr_match("..."))" blocks.
(I thought about cacheing previously seen REs, or adding support for
compiled REs instead of just strings - after all, this idiom usually
occurs in a loop while iterating of some large body of text. It turns
out that the re module already caches previously compiled REs, so I
left my cacheing out in favor of that already being done in the std
lib.)
-- Paul
import re
class REmatcher(object):
def __init__(self,sourceline):
self.line = sourceline
def __call__(self, regexp):
self.matchresult = re.match(regexp, self.line)
self.success = self.matchresult is not None
return self.success
def __getattr__(self, attr):
return getattr(self.matchresult, attr)
This test:
test = """\
ABC
123
xyzzy
Holy Hand Grenade
Take the pebble from my hand, Grasshopper
"""
outfmt = "'%s' is %s [%s]"
for line in test.splitlines():
matchexpr = REmatcher(line)
if matchexpr(r"\d+$"):
print outfmt % (line, "numeric", matchexpr.group())
elif matchexpr(r"[a-z]+$"):
print outfmt % (line, "lowercase", matchexpr.group())
elif matchexpr(r"[A-Z]+$"):
print outfmt % (line, "uppercase", matchexpr.group())
elif matchexpr(r"([A-Z][a-z]*)(\s[A-Z][a-z]*)*$"):
print outfmt % (line, "a proper word or phrase",
matchexpr.group())
else:
print outfmt % (line, "something completely different", "...")
Produces:
'ABC' is uppercase [ABC]
'123' is numeric [123]
'xyzzy' is lowercase [xyzzy]
'Holy Hand Grenade' is a proper word or phrase [Holy Hand Grenade]
'Take the pebble from my hand, Grasshopper' is something completely
different [...]
More information about the Python-list
mailing list