[Python-ideas] Does jargon make learning more difficult?

Fri Aug 24 03:14:10 EDT 2018

Stephen J. Turnbull wrote:
> Abe Dillon writes:
> 
>  > That's interesting. So the parser can see one token past, for instance;
>  > what would be the end of an expression, see "if", and know to expand the
>  > AST?
> 
> Yes, if you want to think about it that way.

For understanding what kinds of things an LL parser can parse,
it helps to have gone through the exercise of implementing a
recursive descent parser.

The classic way to do that is to write a function for each
production in the grammar. E.g. if you have a production

    expr ::= term ['+' expr]

then you would write a function structured like this:

    def parse_expr():
       parse_term()
       if current_token == '+':
          next_token()
          parse_expr()

What other stuff you do in it depends on what you want the
parser to accomplish. If you want it to build a parse tree,
you could do something like this:

    def parse_expr():
       result = parse_term()
       if current_token == '+':
          next_token()
          operand2 = parse_expr()
          result = BinopNode(result, operand2)
       return result

> I don't think of
> it in terms of the parser knowing "where" it is in the program, but
> rather in terms of whether it can do more work on the AST with the
> token (or EOF) in hand.

Not so much where it is in the program, but where it is in
the *grammar*. For example, if parse_term() gets called,
it knows it must be looking at the beginning of a term. If
the current token is not one of the ones that can begin a
term, then there is a syntax error.

The fact that a recursive descent parser always knows where
it is in the grammar is useful, because it can be used to
produce helpful error messages. For example, if it gets
stuck in parse_term() it can say something like "I was
expecting a term here, but this token doesn't look like
the beginning of a term."

Unfortunately, Python's parser doesn't seem to make much
use of this. :-(

> If you think of expressions having their own subparser, the model is a
> little more subtle.  I think of the subparsers as being coroutines

That's kind of the way an LR parser works. An LR parser is
more powerful than an LL parser, because it can effectively
explore several possible parsings in parallel until one
of them succeeds. But I don't think Python uses an LR
parser, because its grammar is required to be LL(1).

If you're interested in this stuff, I recommend either taking
a computer science course or reading a good book about the
theory of parsing. It will bring a lot of these concepts
into sharp focus.

-- 
Greg