[Python-ideas] Re: Universal parsing library in the stdlib to alleviate security issues

24 Jul 2019

      On Tue, Jul 23, 2019 at 8:06 PM Andrew Barnert <abarnert@yahoo.com> wrote:
...
On Jul 23, 2019, at 18:44, Nam Nguyen <bitsink@gmail.com> wrote:
...
FYI, my current proof of concept parser is at ~300 lines of code, with
debugging trace support. Other than performance (which I don't intend to
tackle in my library very soon), is there any other concern that I have
missed? At the moment, I am still of the opinion that the goal raised in
this thread is very attainable, and should be considered.
I personally don’t think anything more needs to be done for a proof of
concept (except maybe proving that performance actually is solvable).
But what’s the actual proposal here?
Back to my original requests to the list: 1) Whether we want to have a
(possibly private) parsing library in the stdlib, and 2) What features it
should have. I have proposed that 1) yes, such a library would be useful,
and 2) several requirements that such a library should fulfill.

Are they acceptable? Apparently not. As you first requested for debug
trace, Barry wanted better performance, and Chris asked for proof such
library would help. How about the other points I suggested? Do we need a
full-blown universal parser? Is LL(1) enough, or do we need k / unlimited
lookaheads? How about context sensitive grammars? What performance budget
can we spend on this vs the status quo? Do we even care if the parser is
small, or that it comes from generated code? Et cetera... There are still
plenty of open questions.

Even if everyone agrees that this is a nifty idea, there’s nothing to
...
accept until someone writes an acceptable-performance stdlib-ready parser
library, the drop-in replacements for the critical bespoke parsing
functions, and the tests that verify that they avoid known security
problems but otherwise provide the same behavior.
These are good points to set as targets! What does it take for me to get
the list to agree on one such set of criteria? When we get pass that, we
can look for solutions to fulfill them. For example, Guido suggested
pyparsing. It is indeed faster than my library, but it does not support
context sensitive or ambiguous grammars, a requirement that *I* think
should be satisfied.

To recap, the "proposal" here isn't one specific solution, but an initial
set of requirements that solution must fulfill. If you think a parser is
useful, let's debate on what form it should take. If you don't think so,
let's hear your reasoning too.

Thanks!
Nam