Martin von Loewis wrote:
Would it be possible to write a Python syntax checker that doesn't stop processing at the first error it finds but instead tries to continue as far as possible (much like make -k) ?
The common approch is to insert or remove tokens, using some heuristics. In YACC, it is possible to add error productions to the grammar. Whenever an error occurs, the parser assigns all tokens to the "error" non-terminal until it concludes that it can perform a reduce action.
The following is based on trying (a great learning experience) to write a better Python lint.
There are IMHO two problems with the current Python grammar file. It is not possible to express operator precedence, so deliberate shift/reduce conflicts are used instead. That makes the parse tree complicated and non intuitive. And there is no provision for error productions. YACC has both of these as built-in features.
I also found speed problems with tokenize.py. AFAIK, it only exists because tokenizer.c does not provide comments as tokens, but eats them instead. We could modify tokenizer.c, then make tokenize.py be the interface to the fast C tokenizer. This eliminates the problem of updating both too.
So how about re-writing the Python grammar in YACC in order to use its more advanced features?? The simple YACC grammar I wrote for 1.5.2 plus an altered tokenizer.c parsed the whole Lib/*.py in a couple seconds vs. 30 seconds for the first file using Aaron Watters' Python lint grammar written in Python.