[Python-ideas] Proposed convenience functions for re module

Thu Jul 23 00:03:35 CEST 2009

22-07-2009, 02:00 Steven D'Aprano <steve at pearwood.info>:

> Following the thread "Experiment: Adding "re" to string objects.", I
> would like to propose the addition of two convenience functions to the
> re module:
>
>
> def multimatch(s, *patterns):
>     """Do a re.match on s using each pattern in patterns,
>     returning the first one to succeed, or None if they all fail."""
>     for pattern in patterns:
>         m = re.match(pattern, s)
>         if m: return m
>
> def multisearch(s, *patterns):
>     """Do a re.search on s using each pattern in patterns,
>     returning the first one to succeed, or None if they all fail."""
>     for pattern in patterns:
>         m = re.search(pattern, s)
>         if m: return m
>
>
> The rationale is to make the following idiom easier:
>
>
> m = re.match(s, pattern1)
> if not m:
>     m = re.match(s, pattern2)
>     if not m:
>         m = re.match(s, pattern3)
>         if not m:
>             m = re.match(s, pattern4)
> if m:
>     m.group()
>
>
> which will become:
>
> m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4)
> if m:
>     m.group()
>
>
> Is there any support or objections to this proposal? Any comments?

It sounds nice. But why not to use simply:

m = re.match(s, '|'.join(pattern1, pattern2, pattern3, pattern4))

And if we want the feature anyway, I'd prefer MRAB's:

>     m = re.match((pattern1, pattern2, pattern3, pattern4), s)
>     if m:
>         print m.group()
>
>  This format is already used by some string methods, eg str.startswith().

***

But if we are talking about convenience functions in re module, it'd
be IMHO very nice to have such functions:

def matchgrouping(pattern, string, flags=0, default=None):
    """Do a re.match on string using pattern,
    returning dict containing groups which could be
     got by index or by name."""

     match = re.match(pattern, string, flags)
     groups = collections.DefaultDict()
     groups.update(enumerate(match.groups()))
     groups.update(match.groupdict())
     return result

Plus the analogous function for searching).
Plus 2 analogous methods of RegexObject instances).

* Then e.g. -- instead of:

m = re.search(pattern, s)
if m:
     first_group = m.group(1)
     surname = m.group('surname')
else:
     first_group = None
     surname = None

-- we could write simply:

m = re.matchgrouping(pattern, s)
first_group = m[1]
surname = m['surname']

* And e.g. -- instead of:

withip = log_re.match(logline)
if withip and withip.group('ip_addr'):
     iplog.append(logline)

-- we could write simply:

if log_re.matchgrouping(logline)['ip_addr']:
     iplog.append(logline)

What do you think about it?

*j

-- 
Jan Kaliszewski (zuo) <zuo at chopin.edu.pl>