
What ever happened to the sre Scanner? It seemed like a good idea but it was not documented and it doesn't work for me. Is it just a case of nobody got around to the documentation or have we decided against it? Here's the code that doesn't work for me: from sre import Scanner scanner = Scanner([ (r"[a-zA-Z_]\w*", None), (r"\d+\.\d*", None), (r"\d+", None), (r"=|\+|-|\*|/", None), (r"\s+", None), ]) tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") Traceback (most recent call last): File "junk.py", line 11, in ? tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") File "c:\program files\python21\lib\sre.py", line 254, in scan action = self.lexicon[m.lastindex][1] TypeError: sequence index must be integer m.lastindex is None -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook

[Paul Prescod]
What ever happened to the sre Scanner? It seemed like a good idea but it was not documented
I previously urged /F to document, and Python-Dev to accept, the .lastindex and .lastgroup match object extensions, but to date <wink> got no response. Whether to adopt the Scanner class too is fuzzier, since AFAICT almost nobody has figured out how to use it.
and it doesn't work for me.
This isn't a code problem, it's a failure to reverse-engineer the undocumeted API <wink>.
Is it just a case of nobody got around to the documentation or have we decided against it?
WRT Scanner, partly the former, nothing of the latter, mostly that there's been no discussion of the API at all. WRT lastindex and lastgroup, I think purely the former.
Here's the code that doesn't work for me:
from sre import Scanner
scanner = Scanner([ (r"[a-zA-Z_]\w*", None), (r"\d+\.\d*", None), (r"\d+", None), (r"=|\+|-|\*|/", None), (r"\s+", None), ])
1. Every tokenization regexp must contain exactly one capturing group. The lack above is the source of your later TypeError. Unclear to me whether that was the intent, or ust the way the code happens to work today. 2. When an action is None, the substring matched by the pattern will be thrown away. You need to supply non-None actions if you want anything to show up in the token list.
tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar")
Traceback (most recent call last): File "junk.py", line 11, in ? tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") File "c:\program files\python21\lib\sre.py", line 254, in scan action = self.lexicon[m.lastindex][1] TypeError: sequence index must be integer
m.lastindex is None
Here's a working rewrite: from sre import Scanner def retrieve(scanner, group): return group scanner = Scanner([ (r"([a-zA-Z_]\w*)", retrieve), (r"(\d+\.\d*)", retrieve), (r"(\d+)", retrieve), (r"(=|\+|-|\*|/)", retrieve), (r"(\s+)", None), # ignore whitespace ]) tokens, tail = scanner.scan("sum = 3*foo + 312.50 + bar") print tokens, `tail` That prints ['sum', '=', '3', '*', 'foo', '+', '312.50', '+', 'bar'] '' In return for that, how about *you* supply a works-on-Windows rewrite of test_urllib2.py? You know more about that than anyone, and the test has been failing for weeks.
participants (2)
-
Paul Prescod
-
Tim Peters