
On Wed, 16 Feb 2022 at 10:23, Chris Angelico <rosuav@gmail.com> wrote:
On Wed, 16 Feb 2022 at 21:01, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
What I think is more interesting than simpler (but more robust for what they can do) facilities is better parser support in standard libraries (not just Python's), and more use of them in place of hand-written "parsers" that just eat tokens defined by regexps in order. If one could, for example, write
[ "Sun|Mon|Tue|Wed|Thu|Fri|Sat" : dow, ", ". "(?: |\d)\d)" : day, " ", "Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec" : month, " ", "\d\d\d\d" : year, " ", "\d\d:\d\d:\d\d" : time, " ", "[+-]\d\d\d\d" : tzoffset ]
(which is not legal Python syntax but I'm too lazy to try to come up with something better) to parse an RFC 822 date, I think people would use that. Sure, for something *that* regular, most people would probably use the evident "literate" regexp with named groups, but it wouldn't take much complexity to make such a parser generator worthwhile to programmers.
That's an interesting concept. I can imagine writing it declaratively like this:
class Date(parser): dow: "Sun|Mon|Tue|Wed|Thu|Fri|Sat" _: ", " day: "(?: |\d)\d)"
I find it mildly amusing that even this "better" solution fell victim to an incorrect regexp ;-) However, I do like the idea of having a better parser library in the stdlib. But it's pretty easy to write such a thing and publish it on PyPI, so the lack of an obvious "best in class" answer for this problem suggests that people would be less likely to use such a feature than we're assuming. The two obvious examples on PyPI are: 1. PyParsing - https://pypi.org/project/pyparsing/. To me, this has the feel of the sort of functional approach SNOBOL used. 2. parse - https://pypi.org/project/parse/. A scanf-style approach inspired by format rather than printf. Do people choose regexes over these because re is in the stdlib? Are they simply less well known? Or is there an attraction to regexes that makes people prefer them in spite of the complexity/maintainability issues? Paul