<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jun 5, 2015 at 9:27 PM, Guido van Rossum <span dir="ltr"><<a href="mailto:guido@python.org" target="_blank">guido@python.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote"><div>You're putting a lot of faith in "modern" parsers. I don't know if PLY qualifies as such, but it certainly is newer than Lex/Yacc, and it unifies the lexer and parser. However I don't think it would be much better for a language the size of Python.<br></div></div></div></div></blockquote><div><br></div><div>PLY doesn't really "unify" the lexer and parser; it just provides both of them in the same Python package (and uses somewhat similar syntax and conventions for each). </div><div><br></div><div>I wrote a project at my last consulting position to process a fairly complex DSL (used for code generation to several targets, Python, C++, Verilog, etc.).  I like PLY, and decided to use that tool; but after a short while I gave up on the parser part of it, and only used the lexing, leaving parsing to "hand rolled" code.</div><div><br></div><div>I'm sure I *could* have managed to shoehorn in the entire EBNF stuff into the parsing component of PLY.  But for my own purpose, I found it more important to do various simplifications and modifications of the token stream before generating the data structures that defined the eventual output parameters.  So in this respect, what I did is something like a simpler version of Python's compilation pipeline.</div><div><br></div><div>Actually, what I did was probably terrible practice for parsing purists, but felt to me like the best "practicality beats purity" approach.  There were these finite number of constructs in the DSL, and I would simply scan through the token stream, in several passes, trying to identify a particular construct, then pulling it out into the relevant data structure type, and just marking those tokens as "used".  Other passes would look for other constructs, and in some cases I'd need to resolve a reference to one kind of construct that wasn't generated until a later pass in a "unification" step.  There was a bit of duct tape and bailing wire involved in all of this, but it actually seemed to keep the code as simple as possible by isolating the code to generate each type of construct.</div><div><br></div><div>None of which is actually relevant to what Python should do in its parsing, just a little bit of rambling thoughts.</div></div><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Keeping medicines from the bloodstreams of the sick; food <br>from the bellies of the hungry; books from the hands of the <br>uneducated; technology from the underdeveloped; and putting <br>advocates of freedom in prisons.  Intellectual property is<br>to the 21st century what the slave trade was to the 16th.<br></div>

</div></div>