[Python-ideas] Hooking between lexer and parser

David Mertz mertz at gnosis.cx
Sat Jun 6 21:02:50 CEST 2015


On Fri, Jun 5, 2015 at 9:27 PM, Guido van Rossum <guido at python.org> wrote:

> You're putting a lot of faith in "modern" parsers. I don't know if PLY
> qualifies as such, but it certainly is newer than Lex/Yacc, and it unifies
> the lexer and parser. However I don't think it would be much better for a
> language the size of Python.
>

PLY doesn't really "unify" the lexer and parser; it just provides both of
them in the same Python package (and uses somewhat similar syntax and
conventions for each).

I wrote a project at my last consulting position to process a fairly
complex DSL (used for code generation to several targets, Python, C++,
Verilog, etc.).  I like PLY, and decided to use that tool; but after a
short while I gave up on the parser part of it, and only used the lexing,
leaving parsing to "hand rolled" code.

I'm sure I *could* have managed to shoehorn in the entire EBNF stuff into
the parsing component of PLY.  But for my own purpose, I found it more
important to do various simplifications and modifications of the token
stream before generating the data structures that defined the eventual
output parameters.  So in this respect, what I did is something like a
simpler version of Python's compilation pipeline.

Actually, what I did was probably terrible practice for parsing purists,
but felt to me like the best "practicality beats purity" approach.  There
were these finite number of constructs in the DSL, and I would simply scan
through the token stream, in several passes, trying to identify a
particular construct, then pulling it out into the relevant data structure
type, and just marking those tokens as "used".  Other passes would look for
other constructs, and in some cases I'd need to resolve a reference to one
kind of construct that wasn't generated until a later pass in a
"unification" step.  There was a bit of duct tape and bailing wire involved
in all of this, but it actually seemed to keep the code as simple as
possible by isolating the code to generate each type of construct.

None of which is actually relevant to what Python should do in its parsing,
just a little bit of rambling thoughts.


-- 
Keeping medicines from the bloodstreams of the sick; food
from the bellies of the hungry; books from the hands of the
uneducated; technology from the underdeveloped; and putting
advocates of freedom in prisons.  Intellectual property is
to the 21st century what the slave trade was to the 16th.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150606/8e24b7c7/attachment-0001.html>


More information about the Python-ideas mailing list