[Python-ideas] Built-in parsing library

Nam Nguyen bitsink at gmail.com
Sun Mar 31 14:58:35 EDT 2019


On Sun, Mar 31, 2019 at 11:00 AM Nick Timkovich <prometheus235 at gmail.com>
wrote:

> What does it mean to be a universal parser? In my mind, to be universal
> you should be able to parse anything, so you'd need something as versatile
> as any Turing language,
>

I'm not aware of, nor looking for, such Turing-complete parsers. Parsing
algorithms such as Earley's, Generalized LL/LR, parser combinators, often
are universal in the sense that they can work with all context-free
grammars. I do not know if they are Turing complete.

so one could stick with the one we already have (Python).
>

One of the reasons why the parser should be "coded" in and not declared
(e.g. in the sense of eBNF). Combinatoric parsers are usually glued
together with functions which can act based on the current parse tree.


> I'm vaguely aware of levels of grammar (regular, context-free?, etc.), and
> how things like XML can't/shouldn't be parsed with regex [1]. Most
> protocols probably aren't *completely* free to do whatever and probably
> fit into some level of the hierarchy, what level would this putative parser
> perform at?
>

I'd say any context-free grammars should be supported. But given the
immediate use case (to help with other libraries in the stdlib), this could
start small (but complete and correct). I am talking about simple parsing
needs such as email validation, HTTP cookie format, URL parsing, well-known
date formats. In fact, I would expect this parsing library to only offer
primitives like parse any character, parse a character matching a
predicate, parse a string, etc.


>
> Doing something like this from-scratch is a very tall order, are there
> candidate libraries that you'd want to see included in the stdlib? There is
> an argument for trying to "promote" a library that would security into the
> standard library over others that would just add features: trying to make
> the "one obvious way to do it" also the safe way. However, all things
> equal, more used libraries tend to be more secure. I think suggestions of
> this form need to pose a library that a) exists, b) is well used and
> regarded, c) stable (once in the the stdlib things are hard to change), and
> d) has maintainers that are amenable to inclusion.
>

This email wasn't to promote or consider any library in particular. I'm
more interested in finding out which way the consensus is with respect to
the need. Implementation-wise, I'm thinking of this paper ~25 years ago and
a very bare-bone pyparsing.

http://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf

Cheers,
Nam


>
> Nick
>
> [1]: https://stackoverflow.com/a/1732454/194586
>
> On Sat, Mar 30, 2019 at 12:57 PM Nam Nguyen <bitsink at gmail.com> wrote:
>
>> Hello list,
>>
>> What do you think of a universal parsing library in the stdlib mainly for
>> use by other libraries in the stdlib?
>>
>> Through out the years we have had many issues with protocol parsing. Some
>> have even introduced security bugs. The main cause of these issues is the
>> use of simple regular expressions.
>>
>> Having a universal parsing library in the stdlib would help cut down
>> these issues. Such a library should be minimal yet encompassing, and whole
>> parse trees should be entirely expressible in code. I am thinking of
>> combinatoric parsing as the main candidate that fits this bill.
>>
>> What do you say?
>>
>> Thanks!
>> Nam
>> _______________________________________________
>> Python-ideas mailing list
>> Python-ideas at python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20190331/b922a75f/attachment.html>


More information about the Python-ideas mailing list