[Python-ideas] Hooking between lexer and parser

Mon Jun 8 04:37:36 CEST 2015

The best parsing library in Python I could find to my eyes is modgrammar:
https://pythonhosted.org/modgrammar/

It's GLR I think.  The documentation isn't bad and the syntax isn't too bad.

The major change that I want to make to it is to replace the grammar class
variables with regular instance generator methods, and to replace the
components of the grammar return value, which are currently classes, with
constructed objects.  That way, a whitespace object that represents a block
continuation can be constructed to know how much whitespace it must match.
Similarly, a "suite" can include a constructed whitespace object that
includes extra space.  After it's matched, it can be queried for its size,
and the grammar generator method can construct whitespace objects with the
appropriate size.  This eliminates the need for INDENT and DEDENT tokens.

This kind of dynamic grammar generation is desirable for all kinds of other
language related problems, like the LaTeX one I discussed, and it also
allows us to merge all of the validation code into the parsing code, which
follows "Don't Repeat Yourself".  I think it's a better design.

I will try to find time to build a demo of this this week.

Ultimately, my problem with "token transformers" is, if I'm understanding
correctly, that we want to change Python so that not only will 3.5 have
Token transformers, but every Python after that has to support this.  This
risks constraining the development of the elegant solution.  And for what
major reason do we even need token transformers so soon?  For a toy example
on python ideas about automatic Decimal instances?  Why can't a user define
a one character function "d(x)" to do the conversion everywhere?  I prefer
to push for the better design even if it means waiting a year.

Best,

Neil

On Sun, Jun 7, 2015 at 6:19 PM, Robert Collins <robertc at robertcollins.net>
wrote:

> On 6 June 2015 at 17:00, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > On 6 June 2015 at 12:21, Neil Girdhar <mistersheik at gmail.com> wrote:
> >> I'm curious what other people will contribute to this discussion as I
> think
> >> having no great parsing library is a huge hole in Python.  Having one
> would
> >> definitely allow me to write better utilities using Python.
> >
> > The design of *Python's* grammar is deliberately restricted to being
> > parsable with an LL(1) parser. There are a great many static analysis
> > and syntax highlighting tools that are able to take advantage of that
> > simplicity because they only care about the syntax, not the full
> > semantics.
> >
> > Anyone actually doing their *own* parsing of something else *in*
> > Python, would be better advised to reach for PLY
> > (https://pypi.python.org/pypi/ply ). PLY is the parser underlying
> > https://pypi.python.org/pypi/pycparser, and hence the highly regarded
> > CFFI library, https://pypi.python.org/pypi/cffi
> >
> > Other notable parsing alternatives folks may want to look at include
> > https://pypi.python.org/pypi/lrparsing and
> > http://pythonhosted.org/pyparsing/ (both of which allow you to use
> > Python code to define your grammar, rather than having to learn a
> > formal grammar notation).
>
> Let me just pimp https://pypi.python.org/pypi/Parsley here - I have
> written languages in both Parsely (a simple packaging metadata
> language) and its predecessor pymeta (in which I wrote pybars -
> handlebars.js for python) - and both were good implementations of
> OMeta, IMNSHO.
>
> -Rob
>
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150607/8d0055f7/attachment-0001.html>