Help with Regular Expressions

Raymond Hettinger othello at javanet.com
Mon Mar 12 20:11:11 EST 2001


Tim Peters wrote:

> [Raymond Hettinger]
> > Is there an idiom for how to use regular expressions for lexing?
> >
> > My attempt below is unsatisfactory because it has to filter the
> > entire match group dictionary to find-out which token caused
> > the match. This approach isn't scalable because every token
> > match will require a loop over all possible token types.
> >
> > I've fiddled with this one for hours and can't seem to find a
> > direct way get a group dictionary that contains only matches.
>
> That's because there isn't a direct way; best you can do now is seek to order
> your alternatives most-likely first (which is a good idea anyway, given the
> way the engine works).
>
> If you peek inside sre.py (2.0 or later), you'll find an undocumented class
> Scanner that uses the undocumented .lastindex attribute of match objects.
> Someday I hope this will be the basis for solving exactly the problem you're
> facing.  There's also an undocumented .lastgroup attribute:
>
> Python 2.1b1 (#11, Mar  2 2001, 11:23:29) [MSC 32 bit (Intel)] on win32
> Type "copyright", "credits" or "license" for more information.
> IDLE 0.6 -- press F1 for help
> >>> import re
> >>> pat = re.compile(r"(?P<a>aa)|(?P<b>bb)")
> >>> m = pat.search("baab")
> >>> m.lastindex  # numeral of group that matched
> 1
> >>> m.lastgroup  # name of group that matched
> 'a'
> >>> m = pat.search("ababba")
> >>> m.lastindex
> 2
> >>> m.lastgroup
> 'b'
> >>>
>
> They're not documented yet because we're not yet sure whether we want to make
> them permanent parts of the language.  So feel free to play, but don't count
> on them staying around forever.  If you like them, drop a note to the effbot
> saying so.
>
> for-more-docs-read-the-source-code-ly y'rs  - tim

Thanks Tim,

I changed the last function line to:
        return (m.lastgroup, m.group(m.lastgroup))
and it worked perfectly.

I'll try out the undocumented variables and send back a recommendation.
Right now, I like them but would prefer them as functions.  Possibly better
is to have a variation of  m.groupdict() or flag that causes it to return a
dictionary
of matches, instead of None for non-matches.

always-amazed-at-how-much-the-bots-know
Raymond





More information about the Python-list mailing list