which re a|l|t|e|r|n|a|t|i|v|e matched?
Diez B. Roggisch
deets_noospaam at web.de
Mon Oct 27 13:33:10 EST 2003
Skip Montanaro wrote:
>
> I have a long regular expression with the top-level form
>
> pat = 'A|B|C|D|...'
>
> where there are a couple hundred alternatives, each one being a fairly
> simple regular expression (typically just the name of a machine).
> Assuming I've compiled that and match against it:
>
> matcher = re.compile(pat)
> match = matcher.match(foo)
> if match is not None:
> ...
>
> is there a way to know what alternative was matched? Note that I'm not
> looking for match.group(1). I want to know which pattern among the
> various
> was matched. (I realize there might be more than one, but returning just
> one is okay.)
>
> If it helps, the regular expression is formed from the keys of a
> dictionary like so:
>
> pat = '('+'|'.join(d.keys())+')'
>
> I'm concatenating them like this so I don't need to make as many
> re.match()
> calls. I could narrow things down by doing a binary search of the keys(),
> but I was hoping for a simple way to do it in one shot.
This might work:
pat = reduce(lambda acc, key: "%s|(%s)" % (acc, key), d.keys())[1:]
Then with
m = re.compile(pat).match(haystack).groups()
you get a tuple of this form
(None,...,<matched-pattern>, None,...)
You can filter that to get the actual mathed group, and determine the first
occurence of a not-None entry. Unfortunately, thats O(n), but hey, you
can't have everything :)
Regards,
Diez
More information about the Python-list
mailing list