
Tim Peters writes:
Chris didn't say this, but I will: I'm amazed that things much _simpler_ than regexps, like his scanf and REXX PARSE examples, haven't spread more.
scanf just isn't powerful enough. For example, consider parsing user input dates: scanf("%d/%d/%d", &year, &month, &day). This is nice and simple, but handling "2022-02-15" as well requires a bit of thinking and several extra statements in C. In Python, I guess it would probably look something like year, sep1, month, sep2, day = scanf("%d%c%d%c%d") if not ('/' == sep1 == sep2 or '-' == sep1 == sep2): raise DateFormatUnacceptableError # range checks for month and day go here which isn't too bad, though. But year, month, day = re.match(r"(\d+)[-/](\d+)[-/](\d+)").groups() if not sep1 == sep2: raise DateFormatUnacceptableError # range checks for month and day go here expresses the intent a lot more clearly, I think. Sure, it's easy to write uninterpretable regexps, but up to that point regexps are very expressive. And that example can be reduced to one line (plus the comment) at the expense of a less symmetric, slightly less readable expression like r"(\d+)([-/])(\d+)\2(\d+)". Some folks might like that one better.
Simple solutions to simple problems are very appealing to me.
The Zawinski quote is motivated by the perception that people seem to think that simplicity lies in minimizing the number of tools you need to learn. REXX and SNOBOL pattern matching quite a bit more specialized to particular tools than regexps. That is, all regexp implementations support the same basic language which is sufficient for most tasks most programmers want regexps for. I think you'd need to implement such a facility in a very popular scripting language such as sh, Perl, or Python for it to have the success of regexps.
Although, to be fair, I get a kick too out of massive overkill ;l-)
Don't we all, though? Steve