[Python-ideas] Proposed convenience functions for re module

MRAB python at mrabarnett.plus.com
Wed Jul 22 18:42:51 CEST 2009


Talin wrote:
> Steven D'Aprano wrote:
>> Following the thread "Experiment: Adding "re" to string objects.", I 
>> would like to propose the addition of two convenience functions to the 
>> re module:
>>
>>
>> def multimatch(s, *patterns):
>>     """Do a re.match on s using each pattern in patterns,     
>> returning the first one to succeed, or None if they all fail."""
>>     for pattern in patterns:
>>         m = re.match(pattern, s)
>>         if m: return m
>>
> 
> There's a cute trick that you can use to do this that is much more 
> efficient than testing each regex expression individually:
> 
>   combined_pattern = "|".join("(%s)" % p for p in patterns)
>   combined_re = re.compile(combined_pattern)
> 
>   m = combined_re.match(string)
>   return m.lastindex
> 
> Basically, it combines all of the patterns into a single large regex, 
> where each pattern is converted into a capturing group. It then returns 
> match.lastindex, which is the index of the capturing group that matched. 
> This is very efficient because now all of the patterns are combined into 
> a single NFA which can prune possibilities very quickly.
> 
> This works for up to 99 patterns, which is the limit on the number of 
> capturing groups that a regex can have.
> 
[snip]
It won't work properly if the patterns themselves contain capture
groups.




More information about the Python-ideas mailing list