[Python-ideas] Hooking between lexer and parser

Neil Girdhar mistersheik at gmail.com
Sat Jun 6 04:21:08 CEST 2015

Back in the day, I remember Lex and Yacc, then came Flex and Bison, and
then ANTLR, which unified lexing and parsing under one common language.  In
general, I like the idea of putting everything together.  I think that
because of Python's separation of lexing and parsing, it accepts weird text
like "(1if 0else 2)", which is crazy.

Here's what I think I want in a parser:

Along with the grammar, you also give it code that it can execute as it
matches each symbol in a rule.  In Python for example, as it matches each
argument passed to a function, it would keep track of the count of *args,
**kwargs, and  keyword arguments, and regular arguments, and then raise a
syntax error if it encounters anything out of order.  Right now that check
is done in validate.c, which is really annoying.

I want to specify the lexical rules in the same way that I specify the
parsing rules.  And I think (after Andrew elucidates what he means by
hooks) I want the parsing hooks to be the same thing as lexing hooks, and I
agree with him that hooking into the lexer is useful.

I want the parser module to be automatically-generated from the grammar if
that's possible (I think it is).

Typically each grammar rule is implemented using a class.  I want the code
generation to be a method on that class.  This makes changing the AST
easy.  For example, it was suggested that we might change the grammar to
include a starstar_expr node.  This should be an easy change, but because
of the way every node validates its children, which it expects to have a
certain tree structure, it would be a big task with almost no payoff.

There's also a question of which parsing algorithm you use.  I wish I knew
more about the state-of-art parsers.  I was interested because I wanted to
use Python to parse my LaTeX files.  I got the impression that
https://en.wikipedia.org/wiki/Earley_parser were state of the art, but I'm
not sure.

I'm curious what other people will contribute to this discussion as I think
having no great parsing library is a huge hole in Python.  Having one would
definitely allow me to write better utilities using Python.

On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho <luciano at ramalho.org> wrote:

> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
> > Modern parsers do not separate the grammar from tokenizing, parsing, and
> > validation.  All of these are done in one place, which not only
> simplifies
> > changes to the grammar, but also protects you from possible
> inconsistencies.
> Hi, Neil, thanks for that!
> Having studied only ancient parsers, I'd love to learn new ones. Can
> you please post references to modern parsing? Actual parsers, books,
> papers, anything you may find valuable.
> I have I hunch you're talking about PEG parsers, but maybe something
> else, or besides?
> Thanks!
> Best,
> Luciano
> --
> Luciano Ramalho
> |  Author of Fluent Python (O'Reilly, 2015)
> |     http://shop.oreilly.com/product/0636920032519.do
> |  Professor em: http://python.pro.br
> |  Twitter: @ramalhoorg
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150605/b5059165/attachment.html>

More information about the Python-ideas mailing list