[Python-ideas] Hooking between lexer and parser

Sat Jun 6 04:55:20 CEST 2015

IMO, lexer and parser separation is sometimes great. It also makes hand-written parsers much simpler.

"Modern" parsing with no lexer and EBNF can sometimes be slower than the classics, especially if one is using an ultra-fast lexer generator such as re2c.

On June 5, 2015 9:21:08 PM CDT, Neil Girdhar <mistersheik at gmail.com> wrote:
>Back in the day, I remember Lex and Yacc, then came Flex and Bison, and
>then ANTLR, which unified lexing and parsing under one common language.
> In
>general, I like the idea of putting everything together.  I think that
>because of Python's separation of lexing and parsing, it accepts weird
>text
>like "(1if 0else 2)", which is crazy.
>
>Here's what I think I want in a parser:
>
>Along with the grammar, you also give it code that it can execute as it
>matches each symbol in a rule.  In Python for example, as it matches
>each
>argument passed to a function, it would keep track of the count of
>*args,
>**kwargs, and  keyword arguments, and regular arguments, and then raise
>a
>syntax error if it encounters anything out of order.  Right now that
>check
>is done in validate.c, which is really annoying.
>
>I want to specify the lexical rules in the same way that I specify the
>parsing rules.  And I think (after Andrew elucidates what he means by
>hooks) I want the parsing hooks to be the same thing as lexing hooks,
>and I
>agree with him that hooking into the lexer is useful.
>
>I want the parser module to be automatically-generated from the grammar
>if
>that's possible (I think it is).
>
>Typically each grammar rule is implemented using a class.  I want the
>code
>generation to be a method on that class.  This makes changing the AST
>easy.  For example, it was suggested that we might change the grammar
>to
>include a starstar_expr node.  This should be an easy change, but
>because
>of the way every node validates its children, which it expects to have
>a
>certain tree structure, it would be a big task with almost no payoff.
>
>There's also a question of which parsing algorithm you use.  I wish I
>knew
>more about the state-of-art parsers.  I was interested because I wanted
>to
>use Python to parse my LaTeX files.  I got the impression that
>https://en.wikipedia.org/wiki/Earley_parser were state of the art, but
>I'm
>not sure.
>
>I'm curious what other people will contribute to this discussion as I
>think
>having no great parsing library is a huge hole in Python.  Having one
>would
>definitely allow me to write better utilities using Python.
>
>
>On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho <luciano at ramalho.org>
>wrote:
>
>> On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <mistersheik at gmail.com>
>> wrote:
>> > Modern parsers do not separate the grammar from tokenizing,
>parsing, and
>> > validation.  All of these are done in one place, which not only
>> simplifies
>> > changes to the grammar, but also protects you from possible
>> inconsistencies.
>>
>> Hi, Neil, thanks for that!
>>
>> Having studied only ancient parsers, I'd love to learn new ones. Can
>> you please post references to modern parsing? Actual parsers, books,
>> papers, anything you may find valuable.
>>
>> I have I hunch you're talking about PEG parsers, but maybe something
>> else, or besides?
>>
>> Thanks!
>>
>> Best,
>>
>> Luciano
>>
>> --
>> Luciano Ramalho
>> |  Author of Fluent Python (O'Reilly, 2015)
>> |     http://shop.oreilly.com/product/0636920032519.do
>> |  Professor em: http://python.pro.br
>> |  Twitter: @ramalhoorg
>>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas
>Code of Conduct: http://python.org/psf/codeofconduct/

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150605/1be3e3cd/attachment.html>