[Python-Dev] Python syntax checker ?

Mon, 25 Sep 2000 10:40:09 -0400

On 20 September 2000, M.-A. Lemburg said:
> Would it be possible to write a Python syntax checker that doesn't
> stop processing at the first error it finds but instead tries
> to continue as far as possible (much like make -k) ?
> 
> If yes, could the existing Python parser/compiler be reused for
> such a tool ?

From what I understand of Python's parser and parser generator, no.
Recovering from errors is indeed highly non-trivial.  If you're really
interested, I'd look into Terence Parr's ANTLR -- it's a very fancy
parser generator that's waaay ahead of pgen (or lex/yacc, for that
matter).  ANTLR 2.x is highly Java-centric, and AFAIK doesn't yet have a
C backend (grumble) -- just C++ and Java.  (Oh wait, the antlr.org web
site says it can generate Sather too -- now there's an important
mainstream language!  ;-)

Tech notes: like pgen, ANTLR is LL; it generates a recursive-descent
parser.  Unlike pgen, ANTLR is LL(k) -- it can support arbitrary
lookahead, although k>2 can make parser generation expensive (not
parsing itself, just turning your grammar into code), as well as make
your language harder to understand.  (I have a theory that pgen's k=1
limitation has been a brick wall in the way of making Python's syntax
more complex, i.e. it's a *feature*!)

More importantly, ANTLR has good support for error recovery.  My BibTeX
parser has a lot of fun recovering from syntax errors, and (with a
little smoke 'n mirrors magic in the lexing stage) does a pretty good
job of it.  But you're right, it's *not* trivial to get this stuff
right.  And without support from the parser generator, I suspect you
would be in a world of hurtin'.

Disclaimer: I'm a programmer, not a computer scientist; it's been ages
since I read the Dragon Book, and I had to struggle with every paragraph
then; PCCTS 1.x (the precursor to ANTLR 2.x) is the only parser
generator I've used personally; and I've never written a parser for a
"real" language (although I can attest that BibTeX's lexical structure
was tricky enough!).

        Greg