which re a|l|t|e|r|n|a|t|i|v|e matched?

Diez B. Roggisch deets_noospaam at web.de
Mon Oct 27 13:33:10 EST 2003


Skip Montanaro wrote:

> 
> I have a long regular expression with the top-level form
> 
>     pat = 'A|B|C|D|...'
> 
> where there are a couple hundred alternatives, each one being a fairly
> simple regular expression (typically just the name of a machine). 
> Assuming I've compiled that and match against it:
> 
>     matcher = re.compile(pat)
>     match = matcher.match(foo)
>     if match is not None:
>         ...
> 
> is there a way to know what alternative was matched?  Note that I'm not
> looking for match.group(1).  I want to know which pattern among the
> various
> was matched.  (I realize there might be more than one, but returning just
> one is okay.)
> 
> If it helps, the regular expression is formed from the keys of a
> dictionary like so:
> 
>     pat = '('+'|'.join(d.keys())+')'
> 
> I'm concatenating them like this so I don't need to make as many
> re.match()
> calls.  I could narrow things down by doing a binary search of the keys(),
> but I was hoping for a simple way to do it in one shot.

This might work:

pat = reduce(lambda acc, key: "%s|(%s)" % (acc, key), d.keys())[1:]

Then with

m = re.compile(pat).match(haystack).groups()

you get a tuple of this form

(None,...,<matched-pattern>, None,...)

You can filter that to get the actual mathed group, and determine the first
occurence of a not-None entry. Unfortunately, thats O(n), but hey, you
can't have everything :)

Regards,

Diez






More information about the Python-list mailing list