Help with Regular Expressions

Mon Mar 12 18:37:14 EST 2001

[Raymond Hettinger]
> Is there an idiom for how to use regular expressions for lexing?
>
> My attempt below is unsatisfactory because it has to filter the
> entire match group dictionary to find-out which token caused
> the match. This approach isn't scalable because every token
> match will require a loop over all possible token types.
>
> I've fiddled with this one for hours and can't seem to find a
> direct way get a group dictionary that contains only matches.

That's because there isn't a direct way; best you can do now is seek to order
your alternatives most-likely first (which is a good idea anyway, given the
way the engine works).

If you peek inside sre.py (2.0 or later), you'll find an undocumented class
Scanner that uses the undocumented .lastindex attribute of match objects.
Someday I hope this will be the basis for solving exactly the problem you're
facing.  There's also an undocumented .lastgroup attribute:

Python 2.1b1 (#11, Mar  2 2001, 11:23:29) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
IDLE 0.6 -- press F1 for help
>>> import re
>>> pat = re.compile(r"(?P<a>aa)|(?P<b>bb)")
>>> m = pat.search("baab")
>>> m.lastindex  # numeral of group that matched
1
>>> m.lastgroup  # name of group that matched
'a'
>>> m = pat.search("ababba")
>>> m.lastindex
2
>>> m.lastgroup
'b'
>>>

They're not documented yet because we're not yet sure whether we want to make
them permanent parts of the language.  So feel free to play, but don't count
on them staying around forever.  If you like them, drop a note to the effbot
saying so.

for-more-docs-read-the-source-code-ly y'rs  - tim