
Talin wrote:
Steven D'Aprano wrote:
Following the thread "Experiment: Adding "re" to string objects.", I would like to propose the addition of two convenience functions to the re module:
def multimatch(s, *patterns): """Do a re.match on s using each pattern in patterns, returning the first one to succeed, or None if they all fail.""" for pattern in patterns: m = re.match(pattern, s) if m: return m
There's a cute trick that you can use to do this that is much more efficient than testing each regex expression individually:
combined_pattern = "|".join("(%s)" % p for p in patterns) combined_re = re.compile(combined_pattern)
m = combined_re.match(string) return m.lastindex
Basically, it combines all of the patterns into a single large regex, where each pattern is converted into a capturing group. It then returns match.lastindex, which is the index of the capturing group that matched. This is very efficient because now all of the patterns are combined into a single NFA which can prune possibilities very quickly.
This works for up to 99 patterns, which is the limit on the number of capturing groups that a regex can have.
[snip] It won't work properly if the patterns themselves contain capture groups.