[Python-ideas] Hooking between lexer and parser

Neil Girdhar mistersheik at gmail.com
Sat Jun 6 18:23:03 CEST 2015


On Sat, Jun 6, 2015 at 3:17 AM, Andrew Barnert <abarnert at yahoo.com> wrote:

> On Jun 5, 2015, at 22:50, Neil Girdhar <mistersheik at gmail.com> wrote:
>
> On Sat, Jun 6, 2015 at 1:30 AM, Andrew Barnert <abarnert at yahoo.com> wrote:
>
>> First, I think your idea is almost completely tangential to mine. Yes, if
>> you completely replaced both the interface and the implementation of the
>> parser, you could do just about anything you wanted. But assuming nobody is
>> going to completely replace the way Python does parsing today, I think it's
>> still useful to add the one missing useful hook to the existing system. But
>> let's continue.
>>
>> On Friday, June 5, 2015 7:08 PM, Neil Girdhar <mistersheik at gmail.com>
>> wrote:
>> On Fri, Jun 5, 2015 at 6:58 PM, Andrew Barnert <abarnert at yahoo.com>
>> wrote:
>> >
>> If you want more background, see
>>
>> http://stupidpythonideas.blogspot.com/2015/06/hacking-python-without-hacking-python.html
>> (which I wrote to explain to someone else how floatliteralhack works).
>>
>
> Yes.  I want to point that if the lexer rules were alongside the parser,
> they would be generating ast nodes — so the hook for calling Decimal for
> all floating point tokens would be doable in the same way as your AST hook.
>
>
> No. The way Python currently exposes things, the AST hook runs on an
> already-generated AST and transforms it into another one, to hand off to
> the code generator. That means it can only be used to handle things that
> parse as legal Python syntax (unless you replace the entire parser).
>
> What I want is a way to similarly take an already-generated token stream
> and transform it into another one, to hand off to the parser. That will
> allow it to be used to handle things that lex as legal Python tokens but
> don't parse as legal Python syntax, like what Paul suggested. Merging
> lexing into parsing not only doesn't give me that, it makes that impossible.
>

Yes, and I what I was suggesting is for the lexer to return AST nodes, so
it would be fine to process those nodes in the same way.


>
> For the new tokens that you want, the ideal solution I think is to modify
> the python parsing grammar before it parses the text.
>
>
> But I don't want any new tokens. I just want to change the way existing
> tokens are interpreted.
>
> Just as with an AST hook like PyMacro, I don't want any new nodes, I just
> want to change the way existing nodes are interpreted.
>
>
Yes, I see *how* you're trying to solve your problem, but my preference is
to have one kind of hook rather than two kinds by unifying lexing and
parsing.  I think that's more elegant.


> Of course I'm not presenting this as an ideal design if I were starting
>> Python from scratch, but as the best design given what Python is already
>> producing and consuming (a stream of tokens that's guaranteed to be
>> equivalent to what you get out of the tokenize module).
>>
>
> This is like saying "I want to change some things, but not other things".
>
>
> That's exactly what I'm saying. In particular, I want to change as few
> things as possible, to get what I want, without breaking stuff that already
> demonstrably works and has worked for decades.
>
> I don't know about OMeta, but the Earley parsing algorithm is worst-cast
> cubic time "quadratic time for unambiguous grammars, and linear time for
> almost all LR(k) grammars".
>
>
> I don't know why you'd want to use Earley for parsing a programming
> language. IIRC, it was the first algorithm that could handle rampant
> ambiguity in polynomial time, but that isn't relevant to parsing
> programming languages (especially one like Python, which was explicitly
> designed to be simple to parse), and it isn't relevant to natural languages
> if you're not still in the 1960s, except in learning the theory and history
> of parsing. GLR does much better in almost-unambiguous/almost-deterministic
> languages; CYK can be easily extended with weights (which propagate
> sensibly, so you can use them for a final judgment, or to heuristically
> prune alternatives as you go); Valiant is easier to reason about
> mathematically; etc. And that's just among the parsers in the same basic
> family as Earley.
>

I suggested Earley to mitigate this fear of "exponential backtracking"
since that won't happen in Earley.


>
> Also, the point of OMeta is that it's not just a parsing algorithm, it's a
> complete system that's been designed and built and is being used to write
> DSLs, macros, and other language extensions in real-life code in languages
> like JavaScript and C#. So you don't have to imagine what kind of interface
> you could present or what it might be like to use it in practice, you can
> use it and find out. And I think it's in the same basic direction as the
> kind of interface you want for Python's parser.
>
> And I'm pretty sure you could design something with most of the strengths
>> of OMeta without its weaknesses (just using a standard packrat PEG parser
>> instead of an extended PEG parser seems like it would turn most of the
>> exponential productions into explicit errors in the grammar…). Or you could
>> go the opposite way and use GLR and bottom-up callbacks instead of PEG and
>> top-down. Something like that would be a great idea for a research project.
>> But it's probably not a great idea for a proposal to change the core of
>> CPython, at least until someone does that research project.
>>
>
> Yes, totally agree with you.  So if it were me doing this work, I would
> put my energy in the research project to write an amazing parser in Python.
>   And then  I would try to convince the Python team to use that.  I guess
> we don't disagree at all.
>
>
> Well, I think we disagree about the value of our time, and about the cost
> of disruptive changes.
>
> If I have a relatively low-work, almost completely non-disruptive way to
> definitely get everything I actually need, and a high-work,
> hugely-disruptive way to probably get what I actually need and also
> probably get a whole bunch of other useful stuff that I might be able to
> sell everyone else on if I also did a lot of additional work, that seems
> like a no-brainer to me.
>
> In fact, even if I wanted to write an amazing parser library for Python
> (and I kind of do, but I don't know if I have the time), I still don't
> think I'd want to suggest it as a replacement for the parser in CPython.
> Writing all the backward-compat adapters and porting the Python parser over
> with all its quirks intact and building the tests to prove that it's
> performance and error handling were strictly better and so on wouldn't be
> nearly as much fun as other things I could do with it.
>

If you ever decide to write that amazing parser library for Python and want
any help please feel free to let me know.

Best,

Neil
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150606/1557b525/attachment-0001.html>


More information about the Python-ideas mailing list