[Python-Dev] Parsing vs. lexing.

Tim Peters tim.one@comcast.net
Wed, 21 Aug 2002 22:47:55 -0400

[Jonathan Riehl]
> As per Zach's comments, I think this is pretty funny.  I have just spent
> more time trying to expose pgen to the interpreter than I took to write
> a R-D parser for Python 1.3 (granted, once Fred's parser module came
> around, I felt a bit silly).

It seems a potential lesson went unlearned then <wink>.

> Considering the scope of my parser generator integration wishlist,
> having GCC move to a hand coded recursive descent parser is going to
> make my head explode.  Even TenDRA (http://www.tendra.org/) used a LL(n)
> parser generator, despite its highly tweaked lookahead code.  So now I'm
> going to have to extract grammars from call trees?  As if the 500
> languages problem isn't already intractable, there are going to be
> popular language implementations that don't even bother with an abstract
> syntax specificaiton!?  (Stop me from further hyperbole if I am
> incorrect.)

Anyone writing an R-D parser by hand without a formal grammer to guide them
is insane.  The formal grammar likely won't capture everything, though --
but then they never do.

> No wonder there are no killer software engineering apps.  Maybe I should
> just start writing toy languages for kids...

Parser generators are great for little languages!  They're painful for real
languages, though, because syntax warts accumulate and then tool rigidity
gets harder to live with.  Hand-crafted R-D parsers are wonderfully
tweakable in intuitive ways (staring at a mountain of parse-table conflicts
and divining how to warp the grammar to shut the tool up is a black art
nobody should regret not learning ...).

15 years of my previous lives were spent as a compiler jockey, working for
HW vendors.  The only time we used a parser generator was the time we used
one written by a major customer, and for political reasons far more than
technical ones.  It worked OK in the end, but it didn't really save any
time.  It did save us from one class of error.  I vividly recall a bug
report against the previous Fortran compiler, where this program line (an


apparently never got executed.  It appeared to be an optimization bug at a
fundamental level, as there was simply no code generated for this statement.
After too much digging, we found that the guy who wrote the Fortran parser
had done the equivalent of

    if not statement.has_label() and statement.startswith('CONT'):
        pass   # an unlabelled CONTINUE statement can be ignored

It's just that nobody had started a variable name with those 4 letters
before.  Yikes!  I was afraid to fly for a year after <wink>.

a-class-of-tool-most-appreciated-when-it's-least-needed-ly y'rs  - tim