matching multiple regexs to a single line...

Alex Martelli aleax at aleax.it
Fri Nov 22 04:51:49 EST 2002


John Hunter wrote:
   ...
> That said, I too am unsatisfied with the M*N performance of the (rgx,
> func) pattern:
> 
>   for line in lines[:M]:
>      for rgx in regexs[:N]:
>          mo = rgx.match(line)
>          if mo: do_something(mo)

If you need to process the match objects for ALL the RE's that match,
I don't think you can do _substantially_ better in general.

> You've mentioned named rgx's in your previous posts.  In the case of
> differential processing of match objects based on the regex match, is
> there a more efficient way to process the mo's than this M*N approach?

If in a given application the case of 'no matches' is very frequent,
then a first-pass check on the line to see whether it does match at
least one of the RE's may give practical advantages, but I don't think
it can possibly change the O() behavior, just potentially give better
multipliers.  And if you need to process mo's for all matches with
the various RE's, rather than just the first match with one of the RE's
taken in some order of priority, then I think that's about it in
terms of the speedups that you can get (without getting into detailed
processing of the patterns involed, and even then, whether you can
get any benefit _at all_ depends on WHAT set of paterns you have).


Alex




More information about the Python-list mailing list