![](https://secure.gravatar.com/avatar/5ce43469c0402a7db8d0cf86fa49da5a.jpg?s=120&d=mm&r=g)
On 2022-02-15 06:05, Tim Peters wrote:
[Steven D'Aprano <steve@pearwood.info>]
I've been interested in the existence of SNOBOL string scanning for a long time, but I know very little about it.
How does it differ from regexes, and why have programming languages pretty much standardised on regexes rather than other forms of string matching?
What we call "regexps" today contain all sorts of things that aren't in the original formal definition of "regular expressions". For example, even the ubiquitous "^" and "$" (start- and end-of-line assertions) go beyond what the phrase formally means.
So the question is ill-defined. When Perl added recursive regular expressions, I'm not sure there's any real difference in theoretical capability remaining. Without that, though, and for example, you can't write a regular expression that matches strings with balanced parentheses ("regexps can't count"), while I earlier posted a simple 2-liner in SNOBOL that implements such a thing (patterns in SNOBOL can freely invoke other patterns, including themselves).
As to why regexps prevailed, traction! They are useful tools, and _started_ life as pretty simple things, with small, elegant, and efficient implementations Feature creep and "faster! faster! faster!" turned the implementations more into bottomless pits now ;-)
Adoption breeds more adoption in the computer world. They have no real competition anymore. The same sociological illness has also cursed us, e.g., with an eternity of floating point signed zeroes ;--)
Chris didn't say this, but I will: I'm amazed that things much _simpler_ than regexps, like his scanf and REXX PARSE examples,,haven't spread more. Simple solutions to simple problems are very appealing to me. Although, to be fair, I get a kick too out of massive overkill ;l-)
Regexes were simple to start with, so only a few metacharacters were needed, the remaining characters being treated as literals. As new features were added, the existing metacharacters were used in new ways that had been illegal until then in order to remain backwards-compatible. Add to that that there are multiple implementations with differing (and sometimes only slightly differing) features and behaviours. It's a good example of evolution: often messy, and resulting in clunky designs.