<div dir="ltr">I don't see why it makes anything simpler.  Your lexing rules just live alongside your parsing rules.  And I also don't see why it has to be faster to do the lexing in a separate part of the code.  Wouldn't the parser generator realize that that some of the rules don't use the stack and so they would end up just as fast as any lexer?</div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 5, 2015 at 10:55 PM, Ryan Gonzalez <span dir="ltr"><<a href="mailto:rymg19@gmail.com" target="_blank">rymg19@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>IMO, lexer and parser separation is sometimes great. It also makes hand-written parsers much simpler.<br>

<br>

"Modern" parsing with no lexer and EBNF can sometimes be slower than the classics, especially if one is using an ultra-fast lexer generator such as re2c.<br>

<br><br><div class="gmail_quote"><div><div class="h5">On June 5, 2015 9:21:08 PM CDT, Neil Girdhar <<a href="mailto:mistersheik@gmail.com" target="_blank">mistersheik@gmail.com</a>> wrote:</div></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div class="h5">

<div dir="ltr">Back in the day, I remember Lex and Yacc, then came Flex and Bison, and then ANTLR, which unified lexing and parsing under one common language.  In general, I like the idea of putting everything together.  I think that because of Python's separation of lexing and parsing, it accepts weird text like "(1if 0else 2)", which is crazy.<div><br></div><div>Here's what I think I want in a parser:</div><div><br></div><div>Along with the grammar, you also give it code that it can execute as it matches each symbol in a rule.  In Python for example, as it matches each argument passed to a function, it would keep track of the count of *args, **kwargs, and  keyword arguments, and regular arguments, and then raise a syntax error if it encounters anything out of order.  Right now that check is done in validate.c, which is really annoying.</div><div><br></div><div>I want to specify the lexical rules in the same way that I specify the parsing rules.  And I

think (after Andrew elucidates what he means by hooks) I want the parsing hooks to be the same thing as lexing hooks, and I agree with him that hooking into the lexer is useful.</div><div><br></div><div>I want the parser module to be automatically-generated from the grammar if that's possible (I think it is).</div><div><br></div><div>Typically each grammar rule is implemented using a class.  I want the code generation to be a method on that class.  This makes changing the AST easy.  For example, it was suggested that we might change the grammar to include a starstar_expr node.  This should be an easy change, but because of the way every node validates its children, which it expects to have a certain tree structure, it would be a big task with almost no payoff.</div><div><br></div><div>There's also a question of which parsing algorithm you use.  I wish I knew more about the state-of-art parsers.  I was interested because I wanted to use Python to parse my LaTeX

files.  I got the impression that <a href="https://en.wikipedia.org/wiki/Earley_parser" target="_blank">https://en.wikipedia.org/wiki/Earley_parser</a> were state of the art, but I'm not sure.</div><div><br></div><div>I'm curious what other people will contribute to this discussion as I think having no great parsing library is a huge hole in Python.  Having one would definitely allow me to write better utilities using Python.</div><div><br></div><div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jun 5, 2015 at 6:55 PM, Luciano Ramalho <span dir="ltr"><<a href="mailto:luciano@ramalho.org" target="_blank">luciano@ramalho.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span>On Fri, Jun 5, 2015 at 5:38 PM, Neil Girdhar <<a href="mailto:mistersheik@gmail.com" target="_blank">mistersheik@gmail.com</a>> wrote:<br>

> Modern parsers do not separate the grammar from tokenizing, parsing, and<br>

> validation.  All of these are done in one place, which not only simplifies<br>

> changes to the grammar, but also protects you from possible inconsistencies.<br>

<br>

</span>Hi, Neil, thanks for that!<br>

<br>

Having studied only ancient parsers, I'd love to learn new ones. Can<br>

you please post references to modern parsing? Actual parsers, books,<br>

papers, anything you may find valuable.<br>

<br>

I have I hunch you're talking about PEG parsers, but maybe something<br>

else, or besides?<br>

<br>

Thanks!<br>

<br>

Best,<br>

<br>

Luciano<br>

<span><font color="#888888"><br>

--<br>

Luciano Ramalho<br>

|  Author of Fluent Python (O'Reilly, 2015)<br>

|     <a href="http://shop.oreilly.com/product/0636920032519.do" target="_blank">http://shop.oreilly.com/product/0636920032519.do</a><br>

|  Professor em: <a href="http://python.pro.br" target="_blank">http://python.pro.br</a><br>

|  Twitter: @ramalhoorg<br>

</font></span></blockquote></div><br></div></div></div>

<p style="margin-top:2.5em;margin-bottom:1em;border-bottom:1px solid #000"></p></div></div><pre><hr><br>Python-ideas mailing list<br><a href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a><span class=""><br><a href="https://mail.python.org/mailman/listinfo/python-ideas" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br>Code of Conduct: <a href="http://python.org/psf/codeofconduct/" target="_blank">http://python.org/psf/codeofconduct/</a></span></pre></blockquote></div><span class="HOEnZb"><font color="#888888"><br>

-- <br>

Sent from my Android device with K-9 Mail. Please excuse my brevity.</font></span></div></blockquote></div><br></div>