regexps: testing and creating MatchObjects in one fell swoop

effbot at effbot at
Sat Sep 9 13:09:41 CEST 2000

dan wrote:
> This isn't a huge pain in the simple case, but it quickly becomes
> annoying when I want to do the equivalent of
>   if (/(\d+)\s+(\d+)/) {
>     ($num1, $num2) = ($1, $2);
>   } elsif (/(\w+)\s+(\w+)/) {
>     ($word1, $word2) = ($1, $2):
>   } # etc.
> as the two-part test in Python doesn't lend itself easily to a long
> if/elif/elif chain.

I'm tempted to mention the "replace nested conditionals
with guard clauses" refactoring rule, but I'll leave that
for another day...

> I've "solved" the problem locally by using the following helper
> function:
>   # research (regexp, string) is the same as (string),
>   # but saves off the match results into 'rematch', so we can test for
>   # a regexp in an if statement and use the results immediately.
>   rematch = None
>   def research (regexp, string):
>       global rematch
>       rematch = (string)
>       return (rematch != None)
> So that I can write:
>   if research (r"(\d+)\s+(\d+)", line):
>       (num1, num2) = rematch.groups()
>   elif research (r"(\w+)\s+(\w+)", line):
>       (word1, word2) = rematch.groups()
>   # etc.
> I suppose I can even inject research into the re module, and inject a
> similar method into regular expression objects. etc, to make it nicer.
> Is there a cleaner, or more approved, way, to accomplish this task?
> If not, does it make any sense to have a re.last_match object that
> automatically contains the last match, allowing, for example:
>   if (r"(\d+)\s+(\d+)", line):
>       (num1, num2) = re.last_match.groups()
> Or is that too side-effecty and non-Pythonic?

Won't fly -- what if two threads are using the same regular
expression?  (or in your rematch example, what if two threads
are using regular expressions...)


There's actually a slightly experimental feature in SRE that
can be useful here: combine your expressions into one big
expression, and use the new "lastgroup" attribute to figure
out which one that matched:

    >>> import re
    >>> p = re.compile("(?P<digits>\d+)|(?P<text>\w+)")
    >>> m ="123 456")
    >>> print m.lastgroup, m.groups()
    digits ('123', None)

(however, keeping track of subgroups can be a major PITA
with this approach...)


<!-- daily news from the python universe:

Sent via
Before you buy.

More information about the Python-list mailing list