[Python-ideas] Hooking between lexer and parser
rymg19 at gmail.com
Sat Jun 6 19:52:31 CEST 2015
On June 6, 2015 12:29:21 AM CDT, Neil Girdhar <mistersheik at gmail.com> wrote:
>On Sat, Jun 6, 2015 at 1:00 AM, Nick Coghlan <ncoghlan at gmail.com>
>> On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at gmail.com> wrote:
>> > I'm curious what other people will contribute to this discussion as
>> > having no great parsing library is a huge hole in Python. Having
>> > definitely allow me to write better utilities using Python.
>> The design of *Python's* grammar is deliberately restricted to being
>> parsable with an LL(1) parser. There are a great many static analysis
>> and syntax highlighting tools that are able to take advantage of that
>> simplicity because they only care about the syntax, not the full
>Given the validation that happens, it's not actually LL(1) though.
>mostly LL(1) with some syntax errors that are raised for various
>Anyway, no one is suggesting changing the grammar.
>> Anyone actually doing their *own* parsing of something else *in*
>> Python, would be better advised to reach for PLY
>> (https://pypi.python.org/pypi/ply ). PLY is the parser underlying
>> https://pypi.python.org/pypi/pycparser, and hence the highly regarded
>> CFFI library, https://pypi.python.org/pypi/cffi
>> Other notable parsing alternatives folks may want to look at include
>> https://pypi.python.org/pypi/lrparsing and
>> http://pythonhosted.org/pyparsing/ (both of which allow you to use
>> Python code to define your grammar, rather than having to learn a
>> formal grammar notation).
>I looked at ply and pyparsing, but it was impossible to simply parse
>because I couldn't explain to suck up the right number of arguments
>the name of the function. When it sees a function, it learns how many
>arguments that function needs. When it sees a function call
>if "\a" takes 2 arguments, then it should only suck up 1 and 2 as
>arguments, and leave 3 as a regular text token. In other words, I
>able to tell the parser what to expect in code that lives on the rule
Can't you just hack it into the lexer? When the slash is detected, the lexer can treat the following identifier as a function, look up the number of required arguments, and push it onto some sort of stack. Whenever a left bracket is encountered and another argument is needed by the TOS, it returns a special argument opener token.
>The parsing tools you listed work really well until you need to do
>something like (1) the validation step that happens in Python, or (2)
>figuring out exactly where the syntax error is (line and column number)
>(3) ensuring that whitespace separates some tokens even when it's not
>required to disambiguate different parse trees. I got the impression
>they wanted to make these languages simple for the simple cases, but
>were made too simple and don't allow you to do everything in one simple
>> Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
>Python-ideas mailing list
>Python-ideas at python.org
>Code of Conduct: http://python.org/psf/codeofconduct/
Sent from my Android device with K-9 Mail. Please excuse my brevity.
More information about the Python-ideas