
On Wed, Feb 16, 2022, 5:46 AM Paul Moore <p.f.moore@gmail.com> wrote:
On Wed, 16 Feb 2022 at 10:23, Chris Angelico <rosuav@gmail.com> wrote:
On Wed, 16 Feb 2022 at 21:01, Stephen J. Turnbull <stephenjturnbull@gmail.com> wrote:
What I think is more interesting than simpler (but more robust for what they can do) facilities is better parser support in standard libraries (not just Python's), and more use of them in place of hand-written "parsers" that just eat tokens defined by regexps in order. If one could, for example, write
[ "Sun|Mon|Tue|Wed|Thu|Fri|Sat" : dow, ", ". "(?: |\d)\d)" : day, " ", "Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec" : month, " ", "\d\d\d\d" : year, " ", "\d\d:\d\d:\d\d" : time, " ", "[+-]\d\d\d\d" : tzoffset ]
(which is not legal Python syntax but I'm too lazy to try to come up with something better) to parse an RFC 822 date, I think people would use that. Sure, for something *that* regular, most people would probably use the evident "literate" regexp with named groups, but it wouldn't take much complexity to make such a parser generator worthwhile to programmers.
That's an interesting concept. I can imagine writing it declaratively like this:
class Date(parser): dow: "Sun|Mon|Tue|Wed|Thu|Fri|Sat" _: ", " day: "(?: |\d)\d)"
I find it mildly amusing that even this "better" solution fell victim to an incorrect regexp ;-)
However, I do like the idea of having a better parser library in the stdlib. But it's pretty easy to write such a thing and publish it on PyPI, so the lack of an obvious "best in class" answer for this problem suggests that people would be less likely to use such a feature than we're assuming.
The two obvious examples on PyPI are:
1. PyParsing - https://pypi.org/project/pyparsing/. To me, this has the feel of the sort of functional approach SNOBOL used. 2. parse - https://pypi.org/project/parse/. A scanf-style approach inspired by format rather than printf.
Do people choose regexes over these because re is in the stdlib? Are they simply less well known? Or is there an attraction to regexes that makes people prefer them in spite of the complexity/maintainability issues?
Paul
Long story below but TLDR: I tried to use parse for a task I worked on for a long time, eventually had to learn regex. After using regex somewhat regularly for a while now I concluded the power and ubiquity of it is worth the additional cognitive load (parse only requires one to be familiar with standard python string format syntax). Story: The first task I set about trying to do in python (with no practical programming experience except for a single semester of c++ as part of a civil engineering curriculum) was a tool to convert 2D finite element mesh files into a file format for a niche finite element analysis program (the program is called CANDE; it's for analysis of buried culverts and pipes). My predecessor was creating these meshes by hand. He would literally get a 24"x36" of drafting paper and draw out his mesh and number the nodes and elements and enter the data into the text file. It took me eons to write something (probably 6 years!), I probably started over from scratch at least 7, maybe 10 times. And even after all that while I finally did arrive at something usable for myself, I never achieved my goal of being able to package something up I can pass on to my other colleague. When a new mesh has to be created they just ask me to do it (I still do them occasionally). Anyway all that to say: I remember trying to avoid learning regex for about 4 years of this. It looked too scary. One day I finally ran into a task that parse: https://pypi.org/project/parse/ ...which I was relying heavily on, couldn't handle. I researched it and am pretty confident that even in my relative ignorance I am/was correct about it but being able to do it (I am racking my brain but can't remember what that need was). This prompted me to FINALLY do a few regex tutorials and watch some pycon videos and I came out the other end realizing that, hey, regex isn't so bad. And it's so darn powerful that the trade off between it being an uphill task to learn and read (at least as first) and what you are able to do with it seems worth it.