[Python-ideas] Proposed convenience functions for re module
MRAB
python at mrabarnett.plus.com
Wed Jul 22 18:42:51 CEST 2009
Talin wrote:
> Steven D'Aprano wrote:
>> Following the thread "Experiment: Adding "re" to string objects.", I
>> would like to propose the addition of two convenience functions to the
>> re module:
>>
>>
>> def multimatch(s, *patterns):
>> """Do a re.match on s using each pattern in patterns,
>> returning the first one to succeed, or None if they all fail."""
>> for pattern in patterns:
>> m = re.match(pattern, s)
>> if m: return m
>>
>
> There's a cute trick that you can use to do this that is much more
> efficient than testing each regex expression individually:
>
> combined_pattern = "|".join("(%s)" % p for p in patterns)
> combined_re = re.compile(combined_pattern)
>
> m = combined_re.match(string)
> return m.lastindex
>
> Basically, it combines all of the patterns into a single large regex,
> where each pattern is converted into a capturing group. It then returns
> match.lastindex, which is the index of the capturing group that matched.
> This is very efficient because now all of the patterns are combined into
> a single NFA which can prune possibilities very quickly.
>
> This works for up to 99 patterns, which is the limit on the number of
> capturing groups that a regex can have.
>
[snip]
It won't work properly if the patterns themselves contain capture
groups.
More information about the Python-ideas
mailing list