[Python-ideas] Hooking between lexer and parser

Sat Jun 6 19:52:31 CEST 2015

On June 6, 2015 12:29:21 AM CDT, Neil Girdhar <mistersheik at gmail.com> wrote:
>On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan <ncoghlan at gmail.com>
>wrote:
>
>> On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at gmail.com> wrote:
>> > I'm curious what other people will contribute to this discussion as
>I
>> think
>> > having no great parsing library is a huge hole in Python.  Having
>one
>> would
>> > definitely allow me to write better utilities using Python.
>>
>> The design of *Python's* grammar is deliberately restricted to being
>> parsable with an LL(1) parser. There are a great many static analysis
>> and syntax highlighting tools that are able to take advantage of that
>> simplicity because they only care about the syntax, not the full
>> semantics.
>>
>
>Given the validation that happens, it's not actually LL(1) though. 
>It's
>mostly LL(1) with some syntax errors that are raised for various
>illegal
>constructs.
>
>Anyway, no one is suggesting changing the grammar.
>
>
>> Anyone actually doing their *own* parsing of something else *in*
>> Python, would be better advised to reach for PLY
>> (https://pypi.python.org/pypi/ply ). PLY is the parser underlying
>> https://pypi.python.org/pypi/pycparser, and hence the highly regarded
>> CFFI library, https://pypi.python.org/pypi/cffi
>>
>> Other notable parsing alternatives folks may want to look at include
>> https://pypi.python.org/pypi/lrparsing and
>> http://pythonhosted.org/pyparsing/ (both of which allow you to use
>> Python code to define your grammar, rather than having to learn a
>> formal grammar notation).
>>
>>
>I looked at ply and pyparsing, but it was impossible to simply parse
>LaTeX
>because I couldn't explain to suck up the right number of arguments
>given
>the name of the function.  When it sees a function, it learns how many
>arguments that function needs.  When it sees a function call
>\a{1}{2}{3},
>if "\a" takes 2 arguments, then it should only suck up 1 and 2 as
>arguments, and leave 3 as a regular text token. In other words, I
>should be
>able to tell the parser what to expect in code that lives on the rule
>edges.

Can't you just hack it into the lexer? When the slash is detected, the lexer can treat the following identifier as a function, look up the number of required arguments, and push it onto some sort of stack. Whenever a left bracket is encountered and another argument is needed by the TOS, it returns a special argument opener token.

>
>The parsing tools you listed work really well until you need to do
>something like (1) the validation step that happens in Python, or (2)
>figuring out exactly where the syntax error is (line and column number)
>or
>(3) ensuring that whitespace separates some tokens even when it's not
>required to disambiguate different parse trees.  I got the impression
>that
>they wanted to make these languages simple for the simple cases, but
>they
>were made too simple and don't allow you to do everything in one simple
>pass.
>
>Best,
>
>Neil
>
>
>> Regards,
>> Nick.
>>
>> --
>> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
>>
>
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Python-ideas mailing list
>Python-ideas at python.org
>https://mail.python.org/mailman/listinfo/python-ideas
>Code of Conduct: http://python.org/psf/codeofconduct/

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.