
On Jul 15, 2019, at 18:44, Nam Nguyen <bitsink@gmail.com> wrote:
I have implemented a tiny (~200 SLOCs) package at https://gitlab.com/nam-nguyen/parser_compynator that demonstrates something like this is possible. There are several examples for you to have a feel of it, as well as some early benchmark numbers to consider. This is far smaller than any of the Python parsing libraries I have looked at, yet more universal than many of them. I hope that it would convert the skeptics ;).
For at least some of your use cases, I don’t think it’s a problem that it’s 70x slower than the custom parsers you’d be replacing. How often do you need to parse a million URLs in your inner loop? Also, if the function composition is really the performance hurdle, can you optimize that away relatively simply, just by building an explicit tree (expression-template style) and walking the tree in a __call__ method, rather than building an implicit tree of nested calls? (And that could be optimized further if needed, e.g. by turning the tree walk into a simple virtual machine where all of the fundamental operations are inlined into the loop, and maybe even accelerating that with C code.) But I do think it’s a problem that there seems to be no way to usefully indicate failure to the caller, and I’m not sure that could be fixed as easily. Invalid inputs in your readme examples don’t fail, they successfully return an empty set. There also doesn’t seem to be any way to trigger a hard fail rather than a backtrack. So I’m not sure how a real urlparse replacement could do the things the current one does, like raising a ValueError on https://abc.d[ef.ghi/ complaining that the netloc looks like an invalid IPv6 address. (Maybe you could def a function that raises a ValueError and attach it as a where somewhere in the parser tree? But even if that works, wouldn’t you get a meaningless exception that doesn’t have any information about where in the source text or where in the parse tree it came from or why it was raised, and, as your readme says, a stack trace full of garbage?) Can you add failure handling without breaking the “~200LOC and easy to read” feature of the library, and without breaking the “easy to read once you grok parser combinators” feature of the parsers built with it?