On Tue, Jul 23, 2019 at 8:06 PM Andrew Barnert <abarnert@yahoo.com> wrote:
On Jul 23, 2019, at 18:44, Nam Nguyen <bitsink@gmail.com> wrote:
> FYI, my current proof of concept parser is at ~300 lines of code, with debugging trace support. Other than performance (which I don't intend to tackle in my library very soon), is there any other concern that I have missed? At the moment, I am still of the opinion that the goal raised in this thread is very attainable, and should be considered.

I personally don’t think anything more needs to be done for a proof of concept (except maybe proving that performance actually is solvable).

But what’s the actual proposal here?

Back to my original requests to the list: 1) Whether we want to have a (possibly private) parsing library in the stdlib, and 2) What features it should have. I have proposed that 1) yes, such a library would be useful, and 2) several requirements that such a library should fulfill.

Are they acceptable? Apparently not. As you first requested for debug trace, Barry wanted better performance, and Chris asked for proof such library would help. How about the other points I suggested? Do we need a full-blown universal parser? Is LL(1) enough, or do we need k / unlimited lookaheads? How about context sensitive grammars? What performance budget can we spend on this vs the status quo? Do we even care if the parser is small, or that it comes from generated code? Et cetera... There are still plenty of open questions.

Even if everyone agrees that this is a nifty idea, there’s nothing to accept until someone writes an acceptable-performance stdlib-ready parser library, the drop-in replacements for the critical bespoke parsing functions, and the tests that verify that they avoid known security problems but otherwise provide the same behavior.

These are good points to set as targets! What does it take for me to get the list to agree on one such set of criteria? When we get pass that, we can look for solutions to fulfill them. For example, Guido suggested pyparsing. It is indeed faster than my library, but it does not support context sensitive or ambiguous grammars, a requirement that *I* think should be satisfied.

To recap, the "proposal" here isn't one specific solution, but an initial set of requirements that solution must fulfill. If you think a parser is useful, let's debate on what form it should take. If you don't think so, let's hear your reasoning too.