Speeding up multiple regex matches
fredrik at pythonware.com
Fri Nov 18 19:09:49 CET 2005
> I've run in to this problem a couple of times. Say I have a piece of
> text that I want to test against a large number of regular expressions,
> where a different action is taken based on which regex successfully
> matched. The naive approach is to loop through each regex, and stop
> when one succeeds. However, I am finding this to be too slow for my
> application -- currently 30% of the run time is being taken up in the
> regex matching.
> I thought of a couple of approaches, but I am not sure how to make them
> 1) Combine all of the regular expressions into one massive regex, and
> let the regex state machine do all the discriminating. The problem with
> this is that it gives you no way to determine which regex was the
> matching one.
use a capturing group for each alternative, and use lastindex to quickly
find the match:
The integer index of the last matched capturing group, or None if
no group was matched at all.
More information about the Python-list