[Python-ideas] Re: Universal parsing library in the stdlib to alleviate security issues

17 Jul 2019

...
On 17 Jul 2019, at 06:30, Nam Nguyen  wrote:
On Mon, Jul 15, 2019 at 8:47 PM Andrew Barnert mailto:abarnert@yahoo.com> wrote:
On Jul 15, 2019, at 18:44, Nam Nguyen mailto:bitsink@gmail.com> wrote:
...
I have implemented a tiny (~200 SLOCs) package at https://gitlab.com/nam-nguyen/parser_compynator https://gitlab.com/nam-nguyen/parser_compynator that demonstrates something like this is possible. There are several examples for you to have a feel of it, as well as some early benchmark numbers to consider. This is far smaller than any of the Python parsing libraries I have looked at, yet more universal than many of them. I hope that it would convert the skeptics ;).
For at least some of your use cases, I don’t think it’s a problem that it’s 70x slower than the custom parsers you’d be replacing. How often do you need to parse a million URLs in your inner loop? Also, if the function composition is really the performance hurdle, can you optimize that away relatively simply, just by building an explicit tree (expression-template style) and walking the tree in a __call__ method, rather than building an implicit tree of nested calls? (And that could be optimized further if needed, e.g. by turning the tree walk into a simple virtual machine where all of the fundamental operations are inlined into the loop, and maybe even accelerating that with C code.)
But I do think it’s a problem that there seems to be no way to usefully indicate failure to the caller, and I’m not sure that could be fixed as easily.
An empty set signifies the parse has failed. Perhaps I have misunderstood what you indicated here.
This is pretty terrible. The empty set shouldn’t be the only feedback for errors. It reminds me of C’s atoi that converts a string to an integer, using 0 to indicate failure. But zero is of course a valid number (and a super common one at that!).

Error messages with parsing isn’t a nice-to-have, it’s absolutely critical.

/ Anders

[Python-ideas] Re: Universal parsing library in the stdlib to alleviate security issues

Anders Hovmöller