Re: [Python-ideas] Proposed convenience functions for re module

July 22, 2009

      22-07-2009, 02:00 Steven D'Aprano <steve@pearwood.info>:
...
Following the thread "Experiment: Adding "re" to string objects.", I
would like to propose the addition of two convenience functions to the
re module:
def multimatch(s, *patterns):
    """Do a re.match on s using each pattern in patterns,
    returning the first one to succeed, or None if they all fail."""
    for pattern in patterns:
        m = re.match(pattern, s)
        if m: return m
def multisearch(s, *patterns):
    """Do a re.search on s using each pattern in patterns,
    returning the first one to succeed, or None if they all fail."""
    for pattern in patterns:
        m = re.search(pattern, s)
        if m: return m
The rationale is to make the following idiom easier:
m = re.match(s, pattern1)
if not m:
    m = re.match(s, pattern2)
    if not m:
        m = re.match(s, pattern3)
        if not m:
            m = re.match(s, pattern4)
if m:
    m.group()
which will become:
m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4)
if m:
    m.group()
Is there any support or objections to this proposal? Any comments?
It sounds nice. But why not to use simply:

m = re.match(s, '|'.join(pattern1, pattern2, pattern3, pattern4))

And if we want the feature anyway, I'd prefer MRAB's:
...
m = re.match((pattern1, pattern2, pattern3, pattern4), s)
    if m:
        print m.group()
This format is already used by some string methods, eg str.startswith().
***

But if we are talking about convenience functions in re module, it'd
be IMHO very nice to have such functions:

def matchgrouping(pattern, string, flags=0, default=None):
    """Do a re.match on string using pattern,
    returning dict containing groups which could be
     got by index or by name."""

     match = re.match(pattern, string, flags)
     groups = collections.DefaultDict()
     groups.update(enumerate(match.groups()))
     groups.update(match.groupdict())
     return result

Plus the analogous function for searching).
Plus 2 analogous methods of RegexObject instances).

* Then e.g. -- instead of:

m = re.search(pattern, s)
if m:
     first_group = m.group(1)
     surname = m.group('surname')
else:
     first_group = None
     surname = None

-- we could write simply:

m = re.matchgrouping(pattern, s)
first_group = m[1]
surname = m['surname']

* And e.g. -- instead of:

withip = log_re.match(logline)
if withip and withip.group('ip_addr'):
     iplog.append(logline)

-- we could write simply:

if log_re.matchgrouping(logline)['ip_addr']:
     iplog.append(logline)

What do you think about it?

*j

-- 
Jan Kaliszewski (zuo) <zuo@chopin.edu.pl>