Mailman 3 Built-in parsing library - Python-ideas

Built-in parsing library

older
cli tool to print value, similar...

Nam Nguyen

March 30, 2019

10:56 a.m.

Hello list, What do you think of a universal parsing library in the stdlib mainly for use by other libraries in the stdlib? Through out the years we have had many issues with protocol parsing. Some have even introduced security bugs. The main cause of these issues is the use of simple regular expressions. Having a universal parsing library in the stdlib would help cut down these issues. Such a library should be minimal yet encompassing, and whole parse trees should be entirely expressible in code. I am thinking of combinatoric parsing as the main candidate that fits this bill. What do you say? Thanks! Nam

Attachments:

attachment.htm (text/html — 769 bytes)

Show replies by date

Nick Timkovich

March 2019

10:59 a.m.

What does it mean to be a universal parser? In my mind, to be universal you should be able to parse anything, so you'd need something as versatile as any Turing language, so one could stick with the one we already have (Python). I'm vaguely aware of levels of grammar (regular, context-free?, etc.), and how things like XML can't/shouldn't be parsed with regex [1]. Most protocols probably aren't *completely* free to do whatever and probably fit into some level of the hierarchy, what level would this putative parser perform at? Doing something like this from-scratch is a very tall order, are there candidate libraries that you'd want to see included in the stdlib? There is an argument for trying to "promote" a library that would security into the standard library over others that would just add features: trying to make the "one obvious way to do it" also the safe way. However, all things equal, more used libraries tend to be more secure. I think suggestions of this form need to pose a library that a) exists, b) is well used and regarded, c) stable (once in the the stdlib things are hard to change), and d) has maintainers that are amenable to inclusion. Nick [1]: https://stackoverflow.com/a/1732454/194586 On Sat, Mar 30, 2019 at 12:57 PM Nam Nguyen <bitsink@gmail.com> wrote:

...

Hello list,

What do you think of a universal parsing library in the stdlib mainly for use by other libraries in the stdlib?

Through out the years we have had many issues with protocol parsing. Some have even introduced security bugs. The main cause of these issues is the use of simple regular expressions.

Having a universal parsing library in the stdlib would help cut down these issues. Such a library should be minimal yet encompassing, and whole parse trees should be entirely expressible in code. I am thinking of combinatoric parsing as the main candidate that fits this bill.

What do you say?

Thanks! Nam _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nam Nguyen

11:58 a.m.

On Sun, Mar 31, 2019 at 11:00 AM Nick Timkovich <prometheus235@gmail.com> wrote:

...

What does it mean to be a universal parser? In my mind, to be universal you should be able to parse anything, so you'd need something as versatile as any Turing language,

I'm not aware of, nor looking for, such Turing-complete parsers. Parsing algorithms such as Earley's, Generalized LL/LR, parser combinators, often are universal in the sense that they can work with all context-free grammars. I do not know if they are Turing complete. so one could stick with the one we already have (Python).

...

One of the reasons why the parser should be "coded" in and not declared (e.g. in the sense of eBNF). Combinatoric parsers are usually glued together with functions which can act based on the current parse tree.

...

I'm vaguely aware of levels of grammar (regular, context-free?, etc.), and how things like XML can't/shouldn't be parsed with regex [1]. Most protocols probably aren't *completely* free to do whatever and probably fit into some level of the hierarchy, what level would this putative parser perform at?

I'd say any context-free grammars should be supported. But given the immediate use case (to help with other libraries in the stdlib), this could start small (but complete and correct). I am talking about simple parsing needs such as email validation, HTTP cookie format, URL parsing, well-known date formats. In fact, I would expect this parsing library to only offer primitives like parse any character, parse a character matching a predicate, parse a string, etc.

...

Doing something like this from-scratch is a very tall order, are there candidate libraries that you'd want to see included in the stdlib? There is an argument for trying to "promote" a library that would security into the standard library over others that would just add features: trying to make the "one obvious way to do it" also the safe way. However, all things equal, more used libraries tend to be more secure. I think suggestions of this form need to pose a library that a) exists, b) is well used and regarded, c) stable (once in the the stdlib things are hard to change), and d) has maintainers that are amenable to inclusion.

This email wasn't to promote or consider any library in particular. I'm more interested in finding out which way the consensus is with respect to the need. Implementation-wise, I'm thinking of this paper ~25 years ago and a very bare-bone pyparsing. http://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf Cheers, Nam

...

Nick

[1]: https://stackoverflow.com/a/1732454/194586

On Sat, Mar 30, 2019 at 12:57 PM Nam Nguyen <bitsink@gmail.com> wrote:

...
Hello list,

What do you think of a universal parsing library in the stdlib mainly for use by other libraries in the stdlib?

Through out the years we have had many issues with protocol parsing. Some have even introduced security bugs. The main cause of these issues is the use of simple regular expressions.

Having a universal parsing library in the stdlib would help cut down these issues. Such a library should be minimal yet encompassing, and whole parse trees should be entirely expressible in code. I am thinking of combinatoric parsing as the main candidate that fits this bill.

What do you say?

Thanks! Nam _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

David Mertz

12:09 p.m.

There are about a half dozen widely used parsing libraries for Python. Each one of them takes a dramatically different approach to the defining a grammar. Each one has been debugged for over a decade. While I can imagine proposing one for inclusion in the standard library, you'd have to choose one (or write a new one) and explain why that one is better for everyone (or at least a better starting point) than all the others are. You're also have to explain why it needs to be in the standard library rather than installed by 'pip install someparser'. On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen <bitsink@gmail.com> wrote:

...

Hello list,

What do you think of a universal parsing library in the stdlib mainly for use by other libraries in the stdlib?

Through out the years we have had many issues with protocol parsing. Some have even introduced security bugs. The main cause of these issues is the use of simple regular expressions.

Having a universal parsing library in the stdlib would help cut down these issues. Such a library should be minimal yet encompassing, and whole parse trees should be entirely expressible in code. I am thinking of combinatoric parsing as the main candidate that fits this bill.

What do you say?

Thanks! Nam _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

David Mertz

12:12 p.m.

I just found this nice summary. It's not complete, but it looks well written. https://tomassetti.me/parsing-in-python/ On Sun, Mar 31, 2019, 3:09 PM David Mertz <mertz@gnosis.cx> wrote:

...

There are about a half dozen widely used parsing libraries for Python. Each one of them takes a dramatically different approach to the defining a grammar. Each one has been debugged for over a decade.

While I can imagine proposing one for inclusion in the standard library, you'd have to choose one (or write a new one) and explain why that one is better for everyone (or at least a better starting point) than all the others are. You're also have to explain why it needs to be in the standard library rather than installed by 'pip install someparser'.

On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen <bitsink@gmail.com> wrote:

...
Hello list,

What do you think of a universal parsing library in the stdlib mainly for use by other libraries in the stdlib?

Through out the years we have had many issues with protocol parsing. Some have even introduced security bugs. The main cause of these issues is the use of simple regular expressions.

Having a universal parsing library in the stdlib would help cut down these issues. Such a library should be minimal yet encompassing, and whole parse trees should be entirely expressible in code. I am thinking of combinatoric parsing as the main candidate that fits this bill.

What do you say?

Thanks! Nam _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Nam Nguyen

9:16 p.m.

On Sun, Mar 31, 2019 at 12:13 PM David Mertz <mertz@gnosis.cx> wrote:

...

I just found this nice summary. It's not complete, but it looks well written. https://tomassetti.me/parsing-in-python/

On Sun, Mar 31, 2019, 3:09 PM David Mertz <mertz@gnosis.cx> wrote:

...
There are about a half dozen widely used parsing libraries for Python. Each one of them takes a dramatically different approach to the defining a grammar. Each one has been debugged for over a decade.

While I can imagine proposing one for inclusion in the standard library, you'd have to choose one (or write a new one) and explain why that one is better for everyone (or at least a better starting point) than all the others are.

I'm not at that stage, yet. By the way, it still is not clear to me if you think having one in the stdlib is desirable.

...

You're also have to explain why it needs to be in the standard library

...
rather than installed by 'pip install someparser'.

Installing a package out of stdlib does not solve the problem that motivated this thread. The libraries included in the stdlib can't use those parsers. Cheers, Nam

...

...
On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen <bitsink@gmail.com> wrote:

...
Hello list,

What do you think of a universal parsing library in the stdlib mainly for use by other libraries in the stdlib?

Through out the years we have had many issues with protocol parsing. Some have even introduced security bugs. The main cause of these issues is the use of simple regular expressions.

Having a universal parsing library in the stdlib would help cut down these issues. Such a library should be minimal yet encompassing, and whole parse trees should be entirely expressible in code. I am thinking of combinatoric parsing as the main candidate that fits this bill.

What do you say?

Thanks! Nam _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

10:14 p.m.

We do have a parser generator in the standard library: https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2 On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen <bitsink@gmail.com> wrote:

...

On Sun, Mar 31, 2019 at 12:13 PM David Mertz <mertz@gnosis.cx> wrote:

...
I just found this nice summary. It's not complete, but it looks well written. https://tomassetti.me/parsing-in-python/

On Sun, Mar 31, 2019, 3:09 PM David Mertz <mertz@gnosis.cx> wrote:

...
There are about a half dozen widely used parsing libraries for Python. Each one of them takes a dramatically different approach to the defining a grammar. Each one has been debugged for over a decade.

While I can imagine proposing one for inclusion in the standard library, you'd have to choose one (or write a new one) and explain why that one is better for everyone (or at least a better starting point) than all the others are.

I'm not at that stage, yet. By the way, it still is not clear to me if you think having one in the stdlib is desirable.

...
You're also have to explain why it needs to be in the standard library

...
rather than installed by 'pip install someparser'.

Installing a package out of stdlib does not solve the problem that motivated this thread. The libraries included in the stdlib can't use those parsers.

Cheers, Nam

...
...
On Sat, Mar 30, 2019, 1:58 PM Nam Nguyen <bitsink@gmail.com> wrote:

...
Hello list,

What do you think of a universal parsing library in the stdlib mainly for use by other libraries in the stdlib?

Through out the years we have had many issues with protocol parsing. Some have even introduced security bugs. The main cause of these issues is the use of simple regular expressions.

Having a universal parsing library in the stdlib would help cut down these issues. Such a library should be minimal yet encompassing, and whole parse trees should be entirely expressible in code. I am thinking of combinatoric parsing as the main candidate that fits this bill.

What do you say?

Thanks! Nam _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________

Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido)

James Lu

10:43 p.m.

Stack-based LL(1) push down automata can be implemented by hand, indeed isn’t that that a textmateLanguage file is? There’s also the option of using Iro to generate a tmLanguage.

Terry Reedy

April 2019

3:12 p.m.

On 4/1/2019 1:14 AM, Guido van Rossum wrote:

...

We do have a parser generator in the standard library: https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2

It is effectively undocumented and by inference discouraged from use. The entry for lib2to3 in the 2to3 doc: https://docs.python.org/3/library/2to3.html#module-lib2to3 " lib2to3 - 2to3’s library Source code: Lib/lib2to3/ Note: The lib2to3 API should be considered unstable and may change drastically in the future. help(pgen) is not much more helpful. : Help on package lib2to3.pgen2 in lib2to3: NAME lib2to3.pgen2 - The pgen2 package. PACKAGE CONTENTS conv driver grammar literals parse pgen token tokenize FILE c:\programs\python38\lib\lib2to3\pgen2\__init__.py -- Terry Jan Reedy

Nam Nguyen

6:53 p.m.

On Mon, Apr 1, 2019 at 3:13 PM Terry Reedy <tjreedy@udel.edu> wrote:

...

On 4/1/2019 1:14 AM, Guido van Rossum wrote:

...
We do have a parser generator in the standard library: https://github.com/python/cpython/tree/master/Lib/lib2to3/pgen2

It is effectively undocumented and by inference discouraged from use.

I've tried it out over the weekend. The undocumented-ness is kinda annoying but surmountable. What I found was this library is tightly coupled to the Python language, both at the lexer and parser levels. For example, defining a simple grammar like this would not work: genericurl: scheme '://' scheme: ... The reason is '://' is not a known token type in Python language. That is a real bummer. Back to my original goal, I've gathered that there is some interest in having a more general parser library in the stdlib. "Some", but not "much". Should I start out with a straw proposal so that we can hash it out further? Cheers, Nam The entry for lib2to3 in the 2to3 doc:

...

https://docs.python.org/3/library/2to3.html#module-lib2to3 " lib2to3 - 2to3’s library Source code: Lib/lib2to3/ Note: The lib2to3 API should be considered unstable and may change drastically in the future.

help(pgen) is not much more helpful. : Help on package lib2to3.pgen2 in lib2to3:

NAME lib2to3.pgen2 - The pgen2 package.

PACKAGE CONTENTS conv driver grammar literals parse pgen token tokenize

FILE c:\programs\python38\lib\lib2to3\pgen2\__init__.py

-- Terry Jan Reedy

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Paul Moore

12:02 a.m.

On Mon, 8 Apr 2019 at 02:54, Nam Nguyen <bitsink@gmail.com> wrote:

...

Back to my original goal, I've gathered that there is some interest in having a more general parser library in the stdlib. "Some", but not "much". Should I start out with a straw proposal so that we can hash it out further?

I would expect that the only reasonable way of getting a parsing library in the stdlib would be to propose an established one from PyPI to be moved into the stdlib - and that would require the active support of the library author. I can't imagine any way that I'd support a brand new parsing library getting put in the stdlib - the area is sufficiently complex, and the external alternatives too mature, to make having a new, relatively untried library in the stdlib be a good idea. Paul

Christopher Barker

7:59 a.m.

On Mon, Apr 8, 2019 at 12:02 AM Paul Moore <p.f.moore@gmail.com> wrote:

...

I would expect that the only reasonable way of getting a parsing library in the stdlib would be to propose an established one from PyPI to be moved into the stdlib

Absolutely -- unlike some proposals, a stand-alone parsing lib could very easily be developed external to the stdlib. If one gains traction as an obvious choice, then we can talk about bringing it in. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Nam Nguyen

9:06 a.m.

On Mon, Apr 8, 2019 at 7:59 AM Christopher Barker <pythonchb@gmail.com> wrote:

...

On Mon, Apr 8, 2019 at 12:02 AM Paul Moore <p.f.moore@gmail.com> wrote:

...
I would expect that the only reasonable way of getting a parsing library in the stdlib would be to propose an established one from PyPI to be moved into the stdlib

Absolutely -- unlike some proposals, a stand-alone parsing lib could very easily be developed external to the stdlib. If one gains traction as an obvious choice, then we can talk about bringing it in.

All options are still on the table. It is important to closely align the solution to the goal of making itself available for *internal use* in the stdlib itself. Having a parser library in the stdlib for *general use* is not an explicit goal that I am aiming for, just as pgen2 was not intended that way. Neither should that deter one from being considered. Nam

...

-CHB

-- Christopher Barker, PhD

Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

Barry Scott

10:59 a.m.

Nam, I'm not so sure that a "universal parsing library" is possible for the stdlib. I think one way you could find out what the requirements are is to refactor at least 2 of the existing stdlib modules that you have identified as needing a better parser. Did you find that you could use the same parser code for both? Would it apply to other modules? Barry

...

On 9 Apr 2019, at 17:06, Nam Nguyen <bitsink@gmail.com> wrote:

On Mon, Apr 8, 2019 at 7:59 AM Christopher Barker <pythonchb@gmail.com <mailto:pythonchb@gmail.com>> wrote:

On Mon, Apr 8, 2019 at 12:02 AM Paul Moore <p.f.moore@gmail.com <mailto:p.f.moore@gmail.com>> wrote: I would expect that the only reasonable way of getting a parsing library in the stdlib would be to propose an established one from PyPI to be moved into the stdlib

Absolutely -- unlike some proposals, a stand-alone parsing lib could very easily be developed external to the stdlib. If one gains traction as an obvious choice, then we can talk about bringing it in.

All options are still on the table. It is important to closely align the solution to the goal of making itself available for *internal use* in the stdlib itself. Having a parser library in the stdlib for *general use* is not an explicit goal that I am aiming for, just as pgen2 was not intended that way. Neither should that deter one from being considered.

Nam

-CHB

-- Christopher Barker, PhD

Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Stephen J. Turnbull

9:13 p.m.

Barry Scott writes:

...

I'm not so sure that a "universal parsing library" is possible for the stdlib.

That shouldn't be our goal. (And I don't think Nam is wedded to that expression of the goal.)

...

I think one way you could find out what the requirements are is to refactor at least 2 of the existing stdlib modules that you have identified as needing a better parser.

I think this is a really good idea. I'll be sprinting on Mailman at PyCon, but if Nam and other proponents have time around PyCon (and haven't done it already :-) I'll be able to make time then. Feel free to ping me off-list. (Meeting at PyCon would be a bonus, but IRC or SNS messaging/whiteboarding works for me too if other interested folks can't be there.)

...

Did you find that you could use the same parser code for both?

I think it highly likely that "enough" protocols and "little languages" that are normally written by machines (or skilled programmers) can be handled by "Dragon Book"[1] parsers to make it worth adding some parsing library to the stdlib. Of course, more general (but still efficient) options have been developed since I last shaved a yacc, but that's not the point. Developers who have special needs (extremely efficient parsing of a relatively simple grammar, more general grammars) or simply want to continue using a different module that they've been requiring from the Cheese Shop since it was called "the Cheese Shop"[2] can (and should) do that. The point of the stdlib is to provide standard batteries that serve in common situations going forward. I've been using regexps since 1980, and am perfectly comfortable with rather complex expressions. Eg, I've written more or less general implementations of RFC 3986 and its predecessor RFC 2396, which is one of the examples Nam has tried. Nevertheless, there are some tricky aspects (for example, I did *not* try to implement 3986 in one expression -- as 3986 says: These restrictions result in five different ABNF rules for a path (Section 3.3), only one of which will match any given URI reference. so I used multiple, mutually exclusive regexps for the different productions). There is no question in my mind that the ABNF is easier to read. Implementing a set of regexps from the ABNF is easier than reconstructing the ABNF from the regexps. That's *my* rationale for including a parsing module in the stdlib: making common parsing tasks more reliable in implementation and more maintainable. To me, the real question is, "Suppose we add a general parsing library to the stdlib, and refactor some modules to use it. (1) Will this 'magically' fix some bugs/RFEs? (Not essential, but would be a nice bonus.) (2) Will the denizens of python-ideas and python-dev find such refactored modules readable and more maintainable than a plethora of ad hoc recursive descent parsers?" Obviously people who haven't studied parsers will have to project to a future self that has become used to reading grammar descriptions, but I think folks around here are used to that kind of projection. This would be a good test. Footnotes: [1] "Do I date myself? Very well then, I date myself." [2] See [1].

Nathaniel Smith

12:17 p.m.

On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen <bitsink@gmail.com> wrote:

...

Installing a package out of stdlib does not solve the problem that motivated this thread. The libraries included in the stdlib can't use those parsers.

Can you be more specific about exactly which code in the stdlib you think should be rewritten to use a parsing library? -n -- Nathaniel J. Smith -- https://vorpus.org

Nam Nguyen

1:08 p.m.

Sure! Same examples mentioned in Victor's https://vstinner.github.io/tag/security.html could have been fixed by having a more proper parser. This one that I helped author was also a parsing issue. https://python-security.readthedocs.io/vuln/bpo-30500_urllib_connects_to_a_w... Thanks for the pointer to pgen2, Guido. I have only quickly skimmed through it and thought it was really closely tied to the Python language. Maybe I'm wrong, so I'll need some time to try it out on some of those previous security fixes. Cheers, Nam On Mon, Apr 1, 2019 at 12:17 PM Nathaniel Smith <njs@pobox.com> wrote:

...

...
Installing a package out of stdlib does not solve the problem that motivated this thread. The libraries included in the stdlib can't use those

On Sun, Mar 31, 2019 at 9:17 PM Nam Nguyen <bitsink@gmail.com> wrote: parsers.

Can you be more specific about exactly which code in the stdlib you think should be rewritten to use a parsing library?

-n

-- Nathaniel J. Smith -- https://vorpus.org

Stephen J. Turnbull

March 2019

11:16 p.m.

David Mertz writes:

...

While I can imagine proposing one for inclusion in the standard library, you'd have to choose one (or write a new one) and explain why that one is better for everyone (or at least a better starting point) than all the others are.

In principle, no, one just needs to explain why this battery fits most of the toys encountered in practice. That's good enough, and if during discussion somebody shows another one is better on a lot of fronts, sure, do that instead. We should avoid letting the perfect be the enemy of the good (as people keep insisting about str.cutsuffix). Politically, sure, it's almost 100% certain that somebody will object that there's a whole class of cases handled by the PackMule parser that the ShavedYacc parser doesn't handle, and somebody else will point out the opposite, so neither is acceptable. Ignore them, they're both wrong about "acceptable". ;-)

...

You're also have to explain why it needs to be in the standard library rather than installed by 'pip install someparser'.

Again, the bar isn't so high as "needs". There's a balance of equities, such as people with Python installations restricted by QA or security vetting, applications where you really don't want to spend most of your hour allocated to teaching the feature downloading requirements, and cases where pretty much everybody performs the task frequently (for some value of frequently), vs. costs of maintenance (we generally require that a core developer vouch for someone who volunteers to take responsibility for it for 3-5 years) and effects on complexity of learning Python (usually not great for such a module, since the excess burden on documentation ends up being one line in the TOC and a half-dozen in the index). Yes, Nam should be prepared for pushback on both grounds. Most pressingly, without a specific package being proposed, discussion will just go in circles indefinitely. But a parser generator package is something that's been lurking, waiting for an enthusiastic proponent for a long time. There's a lot of low-level support for it. Maybe it just needs a specific proposal to take off. And maybe it won't. He won't know unless he tries. Steve P.S. Guido mentioned lib2to3.pgen2, which is in the stdlib. But help(pgen2) isn't very helpful, so there's at least some documentation work to be done there.

David Mertz

April 2019

6:41 a.m.

On Mon, Apr 1, 2019 at 2:16 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

...

In principle, no, one just needs to explain why this battery fits most of the toys encountered in practice. That's good enough, and if during discussion somebody shows another one is better on a lot of fronts, sure, do that instead.

OK, I'll acknowledge my comment might have overstated the bar to overcome. A parser added to the standard library doesn't need to be perfect for everyone. But adding to stdlib *does* provide a kind of endorsement of the right default way to go about things. Among the popular third party libraries, we have several very different attitudes towards designing grammars. On the one hand, there are formal differences in the power of different grammars—some are LL, some LR, some LALR, some Early. Also, some libraries separate parser from lexer while others are scannerless. But most of these can all parse the simple cases fine, so that's good for the 90% coverage. However, cross-cutting that formal power issue, there are two main programming styles used by different libraries. Some libraries use BNF definitions of a grammar as another mini-language inside Python. Exactly where those BNF definitions live varies, but using them is largely similar (i.e. are they in a separate file, in docstrings, contents of variables, etc). And sure, EBNF vs. BNF proper. But other libraries instead use Python functions or classes to define the productions, where each class/function is effectively one term of the grammar. Typically this latter style allows triggering events as soon as some production is encountered—the event could be "accumulate a counter", or "write an output string", or "perform a computation", or other things. There are lots of good arguments for why to use different libraries along the axes I mention, on all sides. What is not possible is to reconcile the very different decisions into a common denominator. Something in the standard library would have to be partisan in selecting one particular approach as the "official" one. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

Stephen J. Turnbull

7:24 a.m.

David Mertz writes:

...

OK, I'll acknowledge my comment might have overstated the bar to overcome. A parser added to the standard library doesn't need to be perfect for everyone. But adding to stdlib *does* provide a kind of endorsement of the right default way to go about things.

Indeed it does, but TOOWTDI is not absolute.

...

However, cross-cutting that formal power issue, there are two main programming styles used by different libraries.

I concede this tends to raise the bar quite a bit.

...

Something in the standard library would have to be partisan in selecting one particular approach as the "official" one.

Perhaps. Even there, though, we have an example: XML. We gotcher SAX, we gotcher DOM, we gotcher ElementTree, we gotcher expat. I think XML processing is probably a *lot* more used and in a lot more modes than general parsing. But the analogy is valid, even though I can't say it's powerful *enough*. There definitely is a bar to clear. I don't know if it's worth Nam's effort to try to clear it -- there's no guarantee of success on something like this. I just think we shouldn't be *too* discouraging. And I personally think parsing formal languages is an important enough field to deserve consideration for stdlib inclusion, even if it's not going to be used every day. Steve

Ai mu

11:56 a.m.

@DavidMertz Each one of them takes a dramatically different approach to the defining a grammar they work more towards implementing well known standards like the BNF. well internally they might work different to parse etc. Abdur-Rahmaan Janhangeer Mauritius

2125

Age (days ago)

2136

Last active (days ago)

List overview

Download

20 comments

12 participants

participants (12)

Ai mu
Barry Scott
Christopher Barker
David Mertz
Guido van Rossum
James Lu
Nam Nguyen
Nathaniel Smith
Nick Timkovich
Paul Moore
Stephen J. Turnbull
Terry Reedy

Built-in parsing library

tags

participants (12)