On Fri, Jul 19, 2019 at 6:33 PM Nam Nguyen <bitsink@gmail.com> wrote:

Yes, I have. PyParsing was the first one I turned too for it has been available for a very long time. I emailed the author, Paul McGuire, a few times about this python-ideas thread too but never got a response.

On Fri, Jul 19, 2019 at 9:36 AM Guido van Rossum <guido@python.org> wrote:
Have you looked into pyparsing (https://github.com/pyparsing/pyparsing)? It somehow looks relevant.

On Mon, Jul 15, 2019 at 6:45 PM Nam Nguyen <bitsink@gmail.com> wrote:
Hello list,

I sent an email to this list two or three months ago about the same idea. In that discussion, there were both skepticism and support. Since I had some time during the previous long weekend, I have made my idea more concrete and I thought I would try with the list again, after having run it through some of you privately.

GOAL: To have some parsing primitives in the stdlib so that other modules in the stdlib itself can make use of. This would alleviate various security issues we have seen throughout the years.

With that goal in mind, I opine that any parsing library for this purpose should have the following characteristics:

#. Can be expressed in code. My opinion is that it is hard to review generated code. Code review is even more crucial in security contexts.

#. Small and verifiable. This helps build trust in the code that is meant to plug security holes.

#. Less evolving. Being in the stdlib has its drawback that is development velocity. The library should be theoretically sound and stable from the beginning.

#. Universal. Most of the times we'll parse left-factored context-free grammars, but sometimes we'll also want to parse context-sensitive grammars such as short XML fragments in which end tags must match start tags.

I have implemented a tiny (~200 SLOCs) package at https://gitlab.com/nam-nguyen/parser_compynator that demonstrates something like this is possible. There are several examples for you to have a feel of it, as well as some early benchmark numbers to consider. This is far smaller than any of the Python parsing libraries I have looked at, yet more universal than many of them. I hope that it would convert the skeptics ;).

Finally, my request to the list is: Please debate on: 1) whether we want a small (even private, underscore prefixed) parsing library in the stdlib to help with tasks that are a little too complex for regexes, and 2) if yes, how should it look like?

I also welcome comments (naming, uses of operator overloading, features, bikeshedding, etc.) on the above package ;).

Thanks!
Nam
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-leave@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2WFZPWUSW3CKGGP7P623GIHG5AK3NVCC/
Code of Conduct: http://python.org/psf/codeofconduct/

--
--Guido van Rossum (python.org/~guido)
Pronouns: he/him/his (why is my pronoun here?)