Blast from the past! [/F]
for phrase, action in lexicon: p.append("(?:%s)(?P#%d)" % (phrase, len(p)))
[Tim]
How about instead enhancing existing (?P<name>pattern) notation, to set a new match object attribute to name if & when pattern matches? Then arbitrary info associated with a named pattern can be gotten at via dicts via the pattern name, & the whole mess should be more readable.
[/F Sent: Sunday, July 02, 2000 6:35 PM]
I just added "lastindex" and "lastgroup" attributes to the match object.
"lastindex" is the integer index of the last matched capturing group, "lastgroup" the corresponding name (or None, if the group didn't have a name). both attributes are None if no group were matched.
Reviewing this before 2.0 has been on my todo list for 3+ months, and finally got to it. Good show! I converted some of my by-hand scanners to use lastgroup, and like it a whole lot. I know you understand why this is Good, so here's a simple example of an "after" tokenizer for those who don't (this one happens to tokenize REXX-like PARSE stmts): import re _token = re.compile(r""" (?P<space> \s+) | (?P<var> [a-zA-Z_]\w*) | (?P<dontcare> \.) | (?P<number> \d+) | (?P<punc> [-+=()]) | (?P<string> " [^"\\\n]* (?: \\. [^"\\\n]*)* " | ' [^'\\\n]* (?: \\. [^'\\\n]*)* ' ) """, re.VERBOSE).match del re (T_SPACE, T_VAR, T_DONTCARE, T_NUMBER, T_PUNC, T_STRING, T_EOF, ) = range(7) # For debug output. _enum2name = ["T_SPACE", "T_VAR", "T_DONTCARE", "T_NUMBER", "T_PUNC", "T_STRING", "T_EOF", ] _group2action = { "space": (T_SPACE, None), "var": (T_VAR, None), "dontcare": (T_DONTCARE, None), "number": (T_NUMBER, int), "punc": (T_PUNC, None), "string": (T_STRING, eval), } def tokenize(s, tokeneater): i, n = 0, len(s) while i < n: m = _token(s, i) if not m: raise ParseError(s, i) group = m.lastgroup enum, action = _group2action[group] val = m.group(group) if action is not None: val = action(val) tokeneater(enum, val) i = m.end() tokeneater(T_EOF, None) The tokenize function here used to be a mass of if/elif stmts trying to figure out which group had matched. Now it's all table-driven: easier to write, reuse & maintain, and quicker to boot. +1. the-aged-may-be-slow-but-they-never-forget<wink>-ly y'rs - tim