
22-07-2009, 02:00 Steven D'Aprano <steve@pearwood.info>:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
def multisearch(s, *patterns): """Do a re.search on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.search(pattern, s) if m: return m
The rationale is to make the following idiom easier:
m = re.match(s, pattern1) if not m: m = re.match(s, pattern2) if not m: m = re.match(s, pattern3) if not m: m = re.match(s, pattern4) if m: m.group()
which will become:
m = re.multimatch(s, pattern1, pattern2, pattern3, pattern4) if m: m.group()
Is there any support or objections to this proposal? Any comments?
It sounds nice. But why not to use simply: m = re.match(s, '|'.join(pattern1, pattern2, pattern3, pattern4)) And if we want the feature anyway, I'd prefer MRAB's:
m = re.match((pattern1, pattern2, pattern3, pattern4), s) if m: print m.group()
This format is already used by some string methods, eg str.startswith().
*** But if we are talking about convenience functions in re module, it'd be IMHO very nice to have such functions: def matchgrouping(pattern, string, flags=0, default=None): """Do a re.match on string using pattern, returning dict containing groups which could be got by index or by name.""" match = re.match(pattern, string, flags) groups = collections.DefaultDict() groups.update(enumerate(match.groups())) groups.update(match.groupdict()) return result Plus the analogous function for searching). Plus 2 analogous methods of RegexObject instances). * Then e.g. -- instead of: m = re.search(pattern, s) if m: first_group = m.group(1) surname = m.group('surname') else: first_group = None surname = None -- we could write simply: m = re.matchgrouping(pattern, s) first_group = m[1] surname = m['surname'] * And e.g. -- instead of: withip = log_re.match(logline) if withip and withip.group('ip_addr'): iplog.append(logline) -- we could write simply: if log_re.matchgrouping(logline)['ip_addr']: iplog.append(logline) What do you think about it? *j -- Jan Kaliszewski (zuo) <zuo@chopin.edu.pl>