<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan <span dir="ltr"><<a href="mailto:ncoghlan@gmail.com" target="_blank">ncoghlan@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 6 June 2015 at 12:21, Neil Girdhar <<a href="mailto:mistersheik@gmail.com">mistersheik@gmail.com</a>> wrote:<br>

> I'm curious what other people will contribute to this discussion as I think<br>

> having no great parsing library is a huge hole in Python.  Having one would<br>

> definitely allow me to write better utilities using Python.<br>

<br>

</span>The design of *Python's* grammar is deliberately restricted to being<br>

parsable with an LL(1) parser. There are a great many static analysis<br>

and syntax highlighting tools that are able to take advantage of that<br>

simplicity because they only care about the syntax, not the full<br>

semantics.<br></blockquote><div><br></div><div>Given the validation that happens, it's not actually LL(1) though.  It's mostly LL(1) with some syntax errors that are raised for various illegal constructs.</div><div><br></div><div>Anyway, no one is suggesting changing the grammar.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Anyone actually doing their *own* parsing of something else *in*<br>

Python, would be better advised to reach for PLY<br>

(<a href="https://pypi.python.org/pypi/ply" target="_blank">https://pypi.python.org/pypi/ply</a> ). PLY is the parser underlying<br>

<a href="https://pypi.python.org/pypi/pycparser" target="_blank">https://pypi.python.org/pypi/pycparser</a>, and hence the highly regarded<br>

CFFI library, <a href="https://pypi.python.org/pypi/cffi" target="_blank">https://pypi.python.org/pypi/cffi</a><br>

<br>

Other notable parsing alternatives folks may want to look at include<br>

<a href="https://pypi.python.org/pypi/lrparsing" target="_blank">https://pypi.python.org/pypi/lrparsing</a> and<br>

<a href="http://pythonhosted.org/pyparsing/" target="_blank">http://pythonhosted.org/pyparsing/</a> (both of which allow you to use<br>

Python code to define your grammar, rather than having to learn a<br>

formal grammar notation).<br>

<br></blockquote><div><br></div><div>I looked at ply and pyparsing, but it was impossible to simply parse LaTeX because I couldn't explain to suck up the right number of arguments given the name of the function.  When it sees a function, it learns how many arguments that function needs.  When it sees a function call \a{1}{2}{3}, if "\a" takes 2 arguments, then it should only suck up 1 and 2 as arguments, and leave 3 as a regular text token. In other words, I should be able to tell the parser what to expect in code that lives on the rule edges.</div><div><br></div><div>The parsing tools you listed work really well until you need to do something like (1) the validation step that happens in Python, or (2) figuring out exactly where the syntax error is (line and column number) or (3) ensuring that whitespace separates some tokens even when it's not required to disambiguate different parse trees.  I got the impression that they wanted to make these languages simple for the simple cases, but they were made too simple and don't allow you to do everything in one simple pass.</div><div><br></div><div>Best,</div><div><br></div><div>Neil</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Regards,<br>

Nick.<br>

<span class="HOEnZb"><font color="#888888"><br>

--<br>

Nick Coghlan   |   <a href="mailto:ncoghlan@gmail.com">ncoghlan@gmail.com</a>   |   Brisbane, Australia<br>

</font></span></blockquote></div><br></div></div>