[Python-ideas] Hooking between lexer and parser
Neil Girdhar
mistersheik at gmail.com
Sat Jun 6 04:21:08 CEST 2015
Back in the day, I remember Lex and Yacc, then came Flex and Bison, and
then ANTLR, which unified lexing and parsing under one common language. In
general, I like the idea of putting everything together. I think that
because of Python's separation of lexing and parsing, it accepts weird text
like "(1if 0else 2)", which is crazy.
Here's what I think I want in a parser:
Along with the grammar, you also give it code that it can execute as it
matches each symbol in a rule. In Python for example, as it matches each
argument passed to a function, it would keep track of the count of *args,
**kwargs, and keyword arguments, and regular arguments, and then raise a
syntax error if it encounters anything out of order. Right now that check
is done in validate.c, which is really annoying.
I want to specify the lexical rules in the same way that I specify the
parsing rules. And I think (after Andrew elucidates what he means by
hooks) I want the parsing hooks to be the same thing as lexing hooks, and I
agree with him that hooking into the lexer is useful.
I want the parser module to be automatically-generated from the grammar if
that's possible (I think it is).
Typically each grammar rule is implemented using a class. I want the code
generation to be a method on that class. This makes changing the AST
easy. For example, it was suggested that we might change the grammar to
include a starstar_expr node. This should be an easy change, but because
of the way every node validates its children, which it expects to have a
certain tree structure, it would be a big task with almost no payoff.
There's also a question of which parsing algorithm you use. I wish I knew
more about the state-of-art parsers. I was interested because I wanted to
use Python to parse my LaTeX files. I got the impression that
https://en.wikipedia.org/wiki/Earley_parser were state of the art, but I'm
not sure.
I'm curious what other people will contribute to this discussion as I think
having no great parsing library is a huge hole in Python. Having one would
definitely allow me to write better utilities using Python.
On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho <luciano at ramalho.org> wrote:
> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <mistersheik at gmail.com>
> wrote:
> > Modern parsers do not separate the grammar from tokenizing, parsing, and
> > validation. All of these are done in one place, which not only
> simplifies
> > changes to the grammar, but also protects you from possible
> inconsistencies.
>
> Hi, Neil, thanks for that!
>
> Having studied only ancient parsers, I'd love to learn new ones. Can
> you please post references to modern parsing? Actual parsers, books,
> papers, anything you may find valuable.
>
> I have I hunch you're talking about PEG parsers, but maybe something
> else, or besides?
>
> Thanks!
>
> Best,
>
> Luciano
>
> --
> Luciano Ramalho
> | Author of Fluent Python (O'Reilly, 2015)
> | http://shop.oreilly.com/product/0636920032519.do
> | Professor em: http://python.pro.br
> | Twitter: @ramalhoorg
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150605/b5059165/attachment.html>
More information about the Python-ideas
mailing list